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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 

The 3GPP transparent end-to-end packet-switched streaming service (PSS) specification consists of three 3GPP TSs; 
3GPP TS 22.233 [1], 3GPP TS 26.233 [2] and the present document. The first TS contains the service requirements for 
the PSS, the second TS provides an overview of the 3GPP PSS and the present document the details of protocol and 
codecs used by the service. 



Introduction 

Streaming refers to the ability of an application to play synchronised media streams like audio and video streams in a 
continuous way while those streams are being transmitted to the client over a data network. 

Applications, which can be built on top of streaming services, can be classified into on-demand and live information 
delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio 
and television programs are examples of the second category. 

The 3GPP PSS provides a framework for Internet Protocol (IP) based streaming applications in 3G networks. 
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Scope 



The present document specifies the protocols and codecs for the PSS within the 3GPP system. Protocols for control 
signalling, capability exchange, scene description, media transport and media encapsulations are specified. Codecs for 
speech, natural and synthetic audio, video, still images, bitmap graphics, vector graphics, timed text and text are 
specified. 

The present document is applicable to IP based packet switched networks. 



2 References 

The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[I] 3GPP TS 22.233: "Transparent End-to-End Packet-switched Streaming Service; Stage 1". 

[2] 3GPP TS 26.233: "Transparent end-to-end packet switched streaming service (PSS); General 

description". 

[3] 3GPP TR 21.905: "Vocabulary for 3GPP Specifications". 

[4] IETF RFC 1738: "Uniform Resource Locators (URL)", Berners-Lee T., Masinter L. and McCahill 

M., December 1994. 

[5] IETF RFC 2326: "Real Time Streaming Protocol (RTSP)", Schulzrinne H., Rao A. and Lanphier 

R., April 1998. 

[6] IETF RFC 2327: "SDP: Session Description Protocol", Handley M. and Jacobson V., April 1998. 

[7] IETF STD 0006: "User Datagram Protocol", Postel J., August 1980. 

[8] IETF STD 0007: "Transmission Control Protocol", Postel J., September 198 1 . 

[9] IETF RFC 1889: "RTP: A Transport Protocol for Real-Time Apphcations", Schulzrinne H. et al., 

January 1996. 

[10] IETF RFC 1890: "RTP Profile for Audio and Video Conferences with Minimal Control", 

Schulzrinne H. et al., January 1996. 

[II] IETF RFC 3267: "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format 
for the Adaptive Multi-Rate (AMR) Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", 
Sjoberg J. et al., June 2002. 

[12] (void) 

[13] IETF RFC 3016: "RTP Payload Format for MPEG-4 Audio/Visual Streams", Kikuchi Y. et al., 

November 2000. 

[14] IETF RFC 2429: "RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video 

(H.263H-)", Bormann C. et al., October 1998. 

[15] IETF RFC 2046: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", Freed 

N. and Borenstein N., November 1996. 
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[16] IETF RFC 3236: "The 'application/xhtml+xml' Media Type", Baker M. and Stark P., January 

2002. 

[17] IETF RFC 2616: "Hypertext Transfer Protocol - HTTP/1.1", Fielding R. et al., June 1999. 

[18] 3GPP TS 26.071: "Mandatory Speech CODEC speech processing functions; AMR Speech 

CODEC; General description". 

[19] 3GPP TS 26.101: "Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate 

(AMR) speech codec frame structure". 

[20] 3GPP TS 26. 17 1 : "AMR Wideband Speech Codec; General Description". 

[21] ISO/IEC 14496-3:2001: "Information technology - Coding of audio-visual objects - Part 3: 

Audio". 

[22] ITU-T Recommendation H.263 (1998): "Video coding for low bit rate communication". 

[23] ITU-T Recommendation H.263 - Annex X (2001): "Annex X: Profiles and levels definition". 

[24] ISO/IEC 14496-2:2001 : "Information technology - Coding of audio-visual objects - Part 2: 

Visual". 

[25] ISO/IEC 14496-2:2001/Amd 2:2002: "Streaming video profile". 

[26] ITU-T Recommendation T.81 (1992) I ISO/IEC 10918-1:1993: "Information technology - Digital 

compression and coding of continuous-tone still images - Requirements and guidelines". 

[27] C-Cube Microsystems: "JPEG File Interchange Format", Version 1.02, September 1, 1992. 

[28] W3C Recommendation: "XHTML Basic", http://www.w3.org/TR/2000/REC-xhtml-basic- 

20001219 , December 2000. 

[29] ISO/IEC 10646-1:2000: "Information technology - Universal Multiple-Octet Coded Character Set 

(UCS) - Part 1: Architecture and Basic Multihngual Plane". 

[30] The Unicode Consortium: "The Unicode Standard", Version 3.0 Reading, MA, Addison- Wesley 

Developers Press, 2000, ISBN 0-201-61633-5. 

[31] W3C Recommendation: "Synchronized Multimedia Integration Language (SMIL 2.0)", 

http://www.w3.or.g/TR/2001/REC-smil20-20010807/ , August 2001. 

[32] CompuServe Incorporated: "GIF Graphics Interchange Format: A Standard defining a mechanism 

for the storage and transmission of raster-based graphics information", Columbus, OH, USA, 
1987. 

[33] CompuServe Incorporated: "Graphics Interchange Format: Version 89a", Columbus, OH, USA, 

1990. 

[34] (void) 

[35] 3GPP TS 26.140: "Multimedia Messaging Service (MMS); Media formats and codecs". 

[36] (void) 

[37] 3GPP TS 26.201: " Speech Codec speech processing functions; AMR Wideband Speech Codec; 

Frame Structure". 

[38] IETF RFC 2083: "PNG (Portable Networks Graphics) Specification Version 1.0", Boutell T., et 

al., March 1997. 

[39] W3C Working Draft Recommendation: "CC/PP structure and vocabularies", 

http://www.w3.org/Mobile/CCPP/Group/Drafts/WD-CCPP-struct-vocab-20010620/ , June 2001. 

[40] WAP UAProf Specification, http://wwwl. wapforum.org/tech/terms. asp?doc=WAP -248-U AProf- 
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[42] W3C Recommendation: "Scalable Vector Graphics (SVG) 1.1 Specification", 

http://www.w3.org/TR/2003/REC-SVGll-20030114/ , January 2003. 
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[45] Scalable Polyphony MIDI Device 5-to-24 Note Profile for 3GPP Version 1 .0, RP-35, MIDI 

Manufacturers Association, Los Angeles, CA, February 2002. 

[46] "Standard MIDI Files 1.0", RP-OOl, in "The Complete MIDI 1.0 Detailed Specification, Document 

Version 96.1", The MIDI Manufacturers Association, Los Angeles, CA, USA, February 1996. 

[47] WAP Forum Specification: "XHTML Mobile Profile", 

http://wwwl.wapforum.org/tech/terms.asp?doc=W AP-277-XHTMLMP-20011029-a.pdf , October 
2001. 

[48] "Unicode Standard Annex #13: Unicode Newline Guidelines", by Mark Davis. An integral part of 

The Unicode Standard, Version 3.1. 

[49] IETF RFC 3266: "Support for IPv6 in Session Description Protocol (SDP)", Olson S., Camarillo 

G. and Roach A. B., June 2002. 

[50] ISO/lEC 14496-12:2003 I 15444-12:2003: "Information technology - Coding of audio-visual 

objects - Part 12: ISO base media file format" I "Information technology - JPEG 2000 image 
coding system - Part 12: ISO base media file format". 

[51] ISO/IEC 14496-14:2003: "Information technology - Coding of audio-visual objects - Part 14: 

MP4 file format". 

[52] IETF RFC 3578: "SDP bandwidth modifier for RTCP bandwidth". 



3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

continuous media: media with an inherent notion of time. In the present document speech, audio, video and timed text 

discrete media: media that itself does not contain an element of time. In the present document all media not defined as 
continuous media 

device capability description: a description of device capabilities and/or user preferences. Contains a number of 
capability attributes 

device capability profile: same as device capability description 

presentation description: contains information about one or more media streams within a presentation, such as the set 
of encodings, network addresses and information about the content 

PSS client: client for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3GPP requirements according to the present document 

PSS server: server for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3GPP requirements according to the present document 
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scene description: description of the spatial layout and temporal behaviour of a presentation. It can also contain 
hyperlinks 



3.2 



Abbreviations 



For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [3] and the following apply. 

3GP 3GPP file format 

AAC Advanced Audio Coding 

BIFS Binary Format for Scenes 

CC/PP Composite Capability / Preference Profiles 

DCT Discrete Cosine Transform 

GIF Graphics Interchange Format 

HTML Hyper Text Markup Language 

ITU-T International Telecommunications Union - Telecommunications 

JFIF JPEG File Interchange Format 

MIDI Musical Instrument Digital Interface 

MIME Multipurpose Internet Mail Extensions 

MMS Multimedia Messaging Service 

MP4 MPEG-4 file format 

PNG Portable Networks Graphics 

PSS Packet-switched Streaming Service 

QCIF Quarter Common Intermediate Format 

RDF Resource Description Framework 

RTCP RTP Control Protocol 

RTP Real-time Transport Protocol 

RTSP Real-Time Streaming Protocol 

SDP Session Description Protocol 

SMIL Synchronised Multimedia Integration Language 

SP-MIDI Scalable Polyphony MIDI 

SVG Scalable Vector Graphics 

UAProf User Agent Profile 

UCS-2 Universal Character Set (the two octet form) 

UTF-8 Unicode Transformation Format (the 8-bit form) 

UTF-16 Unicode Transformation Format (the 16-bit form) 

W3C WWW Consortium 

WML Wireless Markup Language 

XHTML extensible Hyper Text Markup Language 

XML extensible Markup Language 
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Figure 1 : Functional components of a PSS client 
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Figure 1 shows the functional components of a PSS client. Figure 2 gives an overview of the protocol stack used in a 
PSS client and also shows a more detailed view of the packet based network interface. The functional components can 
be divided into control, scene description, media codecs and the transport of media and control data. 

The control related elements are session establishment, capability exchange and session control (see clause 5). 

Session establishment refers to methods to invoke a PSS session from a browser or directly by entering an URL 
in the terminal's user interface. 

Capability exchange enables choice or adaptation of media streams depending on different terminal capabilities. 

Session control deals with the set-up of the individual media streams between a PSS client and one or several 
PSS servers. It also enables control of the individual media streams by the user. It may involve VCR-like 
presentation control functions like start, pause, fast forward and stop of a media presentation. 

The scene description consists of spatial layout and a description of the temporal relation between different media that 
is included in the media presentation. The first gives the layout of different media components on the screen and the 
latter controls the synchronisation of the different media (see clause 8). 

The PSS includes media codecs for video, still images, vector graphics, bitmap graphics, text, timed text, natural and 
synthetic audio, and speech (see clause 7). 

Transport of media and control data consists of the encapsulation of the coded media and control data in a transport 
protocol (see clause 6). This is shown in figure 1 as the "packet based network interface" and displayed in more detail in 
the protocol stack of figure 2. 
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Figure 2: Overview of thie protocol stack 



Protocols 



5.1 



Session establishment 



Session establishment refers to the method by which a PSS client obtains the initial session description. The initial 
session description can e.g. be a presentation description, a scene description or just an URL to the content. 

A PSS client shall support initial session descriptions specified in one of the following formats: SMIL, SDP, or plain 
RTSP URL. 
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In addition to rtsp:// the PSS client shall support URLs [4] to valid initial session descriptions starting with file:// (for 
locally stored files) and http:// (for presentation descriptions or scene descriptions delivered via HTTP). 

Examples for valid inputs to a PSS client are: file://temp/morning_news.smil, http://mediaportal/morning_news.sdp, 
and rtsp://mediaportal/morning_news. 

URLs can be made available to a PSS client in many different ways. It is out of the scope of this recommendation to 
mandate any specific mechanism. However, an application using the 3GPP PSS shall at least support URLs of the 
above type, specified or selected by the user. 

The preferred way would be to embed URLs to initial session descriptions within HTML or WML pages. Browser 
applications that support the HTTP protocol could then download the initial session description and pass the content to 
the PSS client for further processing. How exactly this is done is an implementation specific issue and out of the scope 
of this recommendation. 

5.2 Capability exchange 

5.2.1 General 

Capability exchange is an important functionality in the PSS. It enables PSS servers to provide a wide range of devices 
with content suitable for the particular device in question. Another very important task is to provide a smooth transition 
between different releases of PSS. Therefore, PSS clients and servers should support capability exchange. 

The specification of capability exchange for PSS is divided into two parts. The normative part contained in clause 5.2 
and an informative part in clause A.4 in Annex A of the present document. The normative part gives all the necessary 
requirements that a client or server shall conform to when implementing capability exchange in the PSS. The 
informative part provides additional important information for understanding the concept and usage of the functionality. 
It is recommended to read clause A.4 in Annex A before continuing with clauses 5.2.2-5.2.7. 

5.2.2 The device capability profile structure 

A device capability profile is a RDF [41] document that follows the structure of the CC/PP framework [39] and the 
CC/PP application UAProf [40]. Attributes are used to specify device capabilities and preferences. A set of attribute 
names, permissible values and semantics constitute a CC/PP vocabulary, which is defined by a RDF schema. For PSS 
the UAProf vocabulary is reused and an additional PSS specific vocabulary is defined. The details can be found in 
clause 5.2.3. The syntax of the attributes is defined in the vocabulary schema but also, to some extent, the semantics. A 
PSS device capability profile is an instance of the schema (UAProf and/or the PSS specific schema) and shall follow the 
rules governing the formation of a profile given in the CC/PP specification [39]. The profile schema shall also be 
governed by the rules defined in UAProf [40] chapter 7, 7.1, 7.3 and 7.4. 

5.2.3 Vocabularies for PSS 

5.2.3.1 General 

Clause 5.2.3 specifies the attribute vocabularies to be used by the PSS capability exchange. 

PSS servers should understand the attributes in both the streaming component of the PSS base vocabulary and the 
recommended attributes from the UAProf vocabulary [40]. A server may additionally support other UAProf attributes. 

5.2.3.2 PSS base vocabulary 

The PSS base vocabulary contains one component called "Streaming". A vocabulary extension to UAProf shall be 
defined as a RDF schema. This schema can be found in Annex F. The schema together with the description of the 
attributes in the present clause, defines the vocabulary. The vocabulary is associated with an XML namespace, which 
combines a base URI with a local XML element name to yield a URL Annex F provides the details. 

All PSS attributes are put in a PSS specific component called "Streaming". The list of PSS attributes is as follows: 

Attribute name: AudioChannels 
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Attribute definition: This attribute describes the stereophonic capability of the natural audio device. 

Component: Streaming 

Type: Literal 

Legal values: "Mono", "Stereo" 

Resolution rule: Locked 

EXAMPLE 1: <AudioChannels>Mono</AudioChannels> 



Attribute name: 



MaxPolyphony 



Attribute definition: The MaxPolyphony attribute refers to the maximal polyphony that the synthetic audio device 
supports as defined in [44] . 

NOTE: MaxPolyphony attribute can be used to signal the maximum polyphony capabilities supported by the PSS 
client. This is a complementary mechanism for the delivery of compatible SP-MIDI content and thus the 
PSS client is required to support Scalable Polyphony MIDI i.e. Channel Masking defined in [44]. 

Component: Streaming 

Type: Number 

Legal values: Integer between 5 and 24 

Resolution rule: Locked 

EXAMPLE 2: <MaxPolyphony>8</MaxPolyphony> 



Attribute name: Pss Accept 

Attribute definition: List of content types (MIME types) the PSS application supports. Both CcppAccept 

(SoftwarePlatform, UAProf) and Pss Accept can be used but if Pss Accept is defined it has 
precedence over CcppAccept. 



Component: Streaming 

Type: Literal (Bag) 

Legal values: List of MIME types with related parameters. 

Resolution rule: Append 

EXAMPLES: <PssAccept> 

<rdf :Bag> 

<rdf : li>audio/AMR-WB; octet-alignment</rdf : li> 
<rdf : li>application/smil</rdf : li> 

</rdf :Bag> 
</PssAccept> 



Attribute name: PssAccept-Subset 

Attribute definition: List of content types for which the PSS application supports a subset. MIME-types can in most 
cases effectively be used to express variations in support for different media types. Many 
MIME-types, e.g. AMR-NB has several parameters that can be used for this purpose. There 
may exist content types for which the PSS application only supports a subset and this subset 
can not be expressed with MIME-type parameters. In these cases the attribute PssAccept- 
Subset is used to describe support for a subset of a specific content type. If a subset of a 
specific content type is declared in PssAccept-Subset, this means that PssAccept-Subset has 
precedence over both PssAccept and CcppAccept. PssAccept and/or CcppAccept shall always 
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Component: 
Type: 

Legal values: 
Resolution rule: 
EXAMPLE 4: 



include the corresponding content types for which PSSAccept-Subset specifies subsets of 
This is to ensure compatibility with those content servers that do not understand the PssAccept- 
Subset attribute but do understand e.g. CcppAccept. 

This is illustrated with an example. If PssAccept="audio/AMR", "image/jpeg" and PssAccept- 
Subset="JPEG-PSS" then "audio/AMR" and JPEG Base Hne are supported, "image/jpeg" in 
PssAccept is of no importance since it is related to "JPEG-PSS" in PssAccept-Subset. Subset 
identifiers and corresponding semantics shall only be defined by the TSG responsible for the 
present document. The following values are defined: 

"JPEG-PSS": Only the two JPEG modes described in clause 7.5 of the present document 
are supported. 

- "SVG-Tiny" 

- "SVG-Basic" 
Streaming 
Literal (Bag) 

"JPEG-PSS", "SVG-Tiny", "SVG-Basic" 
Append 

<PssAccept-Subset> 

<rdf :Bag> 

<rdf :li>JPEG-PSS</rdf :li> 

</rdf :Bag> 
< /PssAccept -Sub set > 



Attribute name: Pss Version 

Attribute definition: PSS version supported by the client. 

Component: Streaming 

Type: Literal 

Legal values: "3GPP-R4", "3GPP-R5" and so forth. 

Resolution rule: Locked 

EXAMPLE 5: <PssVersion>3GPP-R4</PssVersion> 

Attribute name: RenderingScreenSize 

Attribute definition: The rendering size of the device's screen in unit of pixels. The horizontal size is given 
followed by the vertical size. 



Component: 
Type: 



Streaming 
Dimension 



Legal values: Two integer values equal or greater than zero. A value equal "OxO"means that there exists no 

possibility to render visual PSS presentations. 

Resolution rule: Locked 

EXAMPLE 6: <RenderingScreenSize>70xl5</RenderingScreenSize> 

Attribute name: SmilBaseSet 



£75/ 



3GPP TS 26.234 version 5.5.0 Release 5 



17 



ETSI TS 126 234 V5.5.0 (2003-06) 



Attribute definition: Indicates a base set of SMIL 2.0 modules that the client supports. 

Component: Streaming 

Type: Literal 

Legal values: Pre-defined identifiers. "SMIL-3GPP-R4" indicates all SMIL 2.0 modules required for scene 

description support according to clause 8 of Release 4 of TS 26.234. "SMIL-3GPP-R5" 
indicates all SMIL 2.0 modules required for scene description support according to clause 8 of 
the present document (Release 5 of TS 26.234). 

Resolution rule: Locked 

EXAMPLE 7: <SmilBaseSet>SMIL-3GPP-R4</SmilBaseSet> 



Attribute name: 



SmilModules 



Attribute definition: This attribute defines a list of SMIL 2.0 modules supported by the client. If the SmilBaseSet is 
used those modules do not need to be explicitly listed here. In that case only additional module 
support needs to be listed. 

Component: Streaming 

Type: Literal (Bag) 

Legal values: SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], section 2.3.3, table 2. 

Resolution rule: Append 

EXAMPLE 8: <SmilModules> 

<rdf :Bag> 

<rdf : li>BasicTransitions</rdf : li> 

<rdf : li>MulitArcTiming</rdf : li> 
</rdf :Bag> 
</SmilModules> 



Attribute name: VideoDecodingByteRate 

Attribute definition: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, this 

attribute defines the peak decoding byte rate the PSS client is able to support. In other words, 
the PSS client fulfils the requirements given in Annex G with the signalled peak decoding byte 
rate. The values are given in bytes per second and shall be greater than or equal to 8000. 
According to Annex G, 8000 is the default peak decoding byte rate for the mandatory video 
codec profile and level (H.263 Profile Level 10). 

Component: Streaming 

Type: Number 

Legal values: Integer value greater than or equal to 8000. 

Resolution rule: Locked 

EXAMPLE 9: <VideoDecodingByteRate>16000</VideoDecodingByteRate> 



Attribute name: VideoInitialPostDecoderBufferingPeriod 

Attribute definition: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, this 
attribute defines the maximum initial post-decoder buffering period of video. Values are 
interpreted as clock ticks of a 90-kHz clock. In other words, the value is incremented by one 
for each 1/90 000 seconds. For example, the value 9000 corresponds to 1/10 of a second initial 
post-decoder buffering. 
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Component: Streaming 

Type: Number 

Legal values: Integer value equal to or greater than zero. 

Resolution rule: Locked 

EXAMPLE 10: <VideoInitialPostDecoderBufferingPeriod>9000 
</VideoInitialPostDecoderBuf f eringPeriod> 

Attribute name: VideoPreDecoderBufferSize 

Attribute definition: This attribute signals if the optional video buffering requirements defined in Annex G are 

supported. It also defines the size of the hypothetical pre-decoder buffer defined in Annex G. A 
value equal to zero means that Annex G is not supported. A value equal to one means that 
Annex G is supported. In this case the size of the buffer is the default size defined in Annex G. 
A value equal to or greater than the default buffer size defined in Annex G means that Annex 
G is supported and sets the buffer size to the given number of octets. 

Component: Streaming 

Type: Number 

Legal values: Integer value equal to or greater than zero. Values greater than one but less than the default 

buffer size defined in Annex G are not allowed. 

Resolution rule: Locked 

EXAMPLE 1 1 : <VideoPreDecoderBufferSize>30720</VideoPreDecoderBufferSize> 

5.2.3.3 Attributes from UAProf 

In the UAProf vocabulary [40] there are several attributes that are of interest for the PSS. The formal definition of these 
attributes is given in [40]. The following list of attributes is recommended for PSS applications: 

Attribute name: BitsPerPixel 

Component: HardwarePlatform 

Attribute description: The number of bits of colour or greyscale information per pixel 

EXAMPLE 1: <BitsPerPixel>8</BitsPerPixel> 



Attribute name: Color Capable 

Component: HardwarePlatform 

Attribute description: Whether the device display supports colour or not. 

EXAMPLE 2: <ColorCapable>Yes</ColorCapable> 

Attribute name: PixelAspectRatio 

Component: HardwarePlatform 

Attribute description: Ratio of pixel width to pixel height 

EXAMPLE 3: <PixelAspectRatio>lx2</PixelAspectRatio> 
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Attribute name: PointingResolution 

Component: HardwarePlatform 

Attribute description: Type of resolution of the pointing accessory supported by the device. 

EXAMPLE 4: <PointingResolution>Pixel</PointingResolution> 

Attribute name: Model 

Component: HardwarePlatform 

Attribute description: Model number assigned to the terminal device by the vendor or manufacturer 

EXAMPLE 5: <Model>Lexus</Model> 

Attribute name: Vendor 

Component: HardwarePlatform 

Attribute description: Name of the vendor manufacturing the terminal device 

EXAMPLE 6: <Vendor>Toyota</Vendor> 

Attribute name: CcppAccept-Charset 

Component: SoftwarePlatform 

Attribute description: List of character sets the device supports 

EXAMPLE 7: <CcppAccept-Charset> 
<rdf :Bag> 

<rdf :li>UTF-8</rdf :li> 
</rdf :Bag> 
</CcppAccept -Char set > 

Attribute name: CcppAccept-Encoding 

Component: SoftwarePlatform 

Attribute description: List of transfer encodings the device supports 

EXAMPLE 8: <CcppAccept-Encoding> 
<rdf :Bag> 

<rdf :li>base64</rdf :li> 
</rdf :Bag> 
</CcppAccept-Encoding> 

Attribute name: CcppAccept-Language 

Component: SoftwarePlatform 

Attribute description: List of preferred document languages 
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EXAMPLE 9: <CcppAccept-Language> 

<rdf : Seq> 

<rdf : li>en</rdf : li> 
<rdf : li>se</rdf : li> 
</rdf : Seq> 
</CcppAccept-Language> 

5.2.4 Extensions to the PSS schema/vocabulary 

The use of RDF enables an extensibility mechanism for CC/PP-based schemas that addresses the evolution of new types 
of devices and applications. The PSS profile schema specification is going to provide a base vocabulary but in the 
future new usage scenarios might have need for expressing new attributes. If the base vocabulary is updated a new 
unique namespace will be assigned to the updated schema. The base vocabulary shall only be changed by the TSG 
responsible for the present document. All extensions to the profile schema shall be governed by the rules defined in [40] 
clause 7.7. 

5.2.5 Signalling of profile information between client and server 

When a PSS client or server support capability exchange it shall support the profile information transport over both 
HTTP and RTSP between client and server as defined in clause 9.1 (including its subsections) of the WAP 2.0 UAProf 
specification [40] with the following additions: 

The "x-wap-profile" and "x-wap-profile-diff headers may not be present in all HTTP or RTSP request. That is, 
the requirement to send this header in all requests has been relaxed. 

The defined headers may be applied to both RTSP and HTTP. 

The "x-wap-profile-diff" header is only valid for the current request. The reason is that PSS does not have the 
WSP session concept of WAP. 

Push is not relevant for the PSS. 

The following recommendations are made to how and when profile information should be sent between client and 
server: 

PSS content servers supporting capability exchange shall be able to receive profile information in all HTTP and 
RTSP requests. 

The terminal should not send the "x-wap-profile-diff header over the air-interface since there is no compression 
scheme defined. 

RTSP: the client should send profile information in the DESCRIBE message. It may send it in any other request. 

If the terminal has some prior knowledge about the file type it is about to retrieve, e.g. file extensions, the following 
apply: 

HTTP and SDP: when retrieving an SDP with HTTP the client should include profile information in the GET 
request. This way the HTTP server can deliver an optimised SDP to the client. 

HTTP and SMIL: When retrieving a SMIL file with HTTP the client should include profile information in the 
GET request. This way the HTTP server can deliver an optimised SMIL presentation to the client. A SMIL 
presentation can include links to static media. The server should optimise the SMIL file so that links to the 
referenced static media are adapted to the requesting client. When the "x-wap-profile- warning" indicates that 
content selection has been applied (201-203) the PSS client should assume that no more capability exchange has 
to be performed for the static media components. In this case it should not send any profile information when 
retrieving static media to be included in the SMIL presentation. This will minimise the HTTP header overhead. 

5.2.6 Merging device capability profiles 

Profiles need to be merged whenever the PSS server receives multiple device capability profiles. Multiple occurrences 
of attributes and default values make it necessary to resolve the profiles according to a resolution process. 

The resolution process shall be the same as defined in UAProf [40] clause 6.4. L 



£75/ 



3GPP TS 26.234 version 5.5.0 Release 5 21 ETSI TS 1 26 234 V5.5.0 (2003-06) 

Resolve all indirect references by retrieving URI references contained within the profile. 

Resolve each profile and profile-diff document by first applying attribute values contained in the default URI 
references and by second applying overriding attribute values contained within the category blocks of that profile 
or profile-diff. 

Determine the final value of the attributes by applying the resolved attribute values from each profile and profile- 
diff in order, with the attribute values determined by the resolution rules provided in the schema. Where no 
resolution rules are provided for a particular attribute in the schema, values provided in profiles or profile-diffs 
are assumed to override values provided in previous profiles or profile-diffs. 

When several URLs are defined in the "x-wap-profile" header and there exists any attribute that occurs more than once 
in these profiles the rule is that the attribute value in the second URL overrides, or is overridden by, or is appended to 
the attribute value from the first URL (according to the resolution rule) and so forth. This is what is meant with 
"Determine the final value of the attributes by applying the resolved attribute values from each profile and profile-diff 
in order, with. . ." in the third bullet above. If the profile is completely or partly inaccessible or otherwise corrupted the 
server should still provide content to the client. The server is responsible for delivering content optimised for the client 
based on the received profile in a best effort manner. 

NOTE: For the reasons explained in Annex A clause A.4.3 the usage of indirect references in profiles (using the 
CC/PP defaults element) is not recommended. 

5.2.7 Profile transfer between the PSS server and the device profile 
server 

The device capability profiles are stored on a device profile server and referenced with URLs. According to the profile 
resolution process in clause 5.2.6 of the present document, the PSS server ends up with a number of URLs referring to 
profiles and these shall be retrieved. 

The device profile server shall support HTTP LI for the transfer of device capability profiles to the PSS server. 

If the PSS server supports capability exchange it shall support HTTP 1.1 for transfer of device capability profiles 
from the device profile server. A URL shall be used to identify a device capability profile. 

Normal content caching provisions as defined by HTTP apply. 

5.3 Session set-up and control 

5.3.1 General 

Continuous media is media that has an intrinsic time line. Discrete media on the other hand does not itself contain an 
element of time. In this specification speech, audio and video belongs to first category and still images and text to the 
latter one. 

Streaming of continuous media using RTP/UDP/IP (see clause 6.2) requires a session control protocol to set-up and 
control of the individual media streams. For the transport of discrete media (images and text), vector graphics, timed 
text and synthetic audio this specification adopts the use of HTTP/TCP/IP (see clause 6.3). In this case there is no need 
for a separate session set-up and control protocol since this is built into HTTP. This clause describes session set-up and 
control of the continuous media speech, audio and video. 

5.3.2 RTSP 

RTSP [5] shall be used for session set-up and session control. PSS clients and servers shall follow the rules for minimal 
on-demand playback RTSP implementations in appendix D of [5]. In addition to this: 

PSS servers and clients shall implement the DESCRIBE method (see clause 10.2 in [5]); 

PSS servers and clients shall implement the Range header field (see clause 12.29 in [5]); 

PSS servers shall include the Range header field in all PLAY responses. 
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5.3.3 SDP 

5.3.3.1 General 

RTSP requires a presentation description. SDP shall be used as the format of the presentation description for both PSS 
clients and servers. PSS servers shall provide and clients interpret the SDP syntax according to the SDP specification 
[6] and appendix C of [5]. The SDP delivered to the PSS client shall declare the media types to be used in the session 
using a codec specific MIME media type for each media. MIME media types to be used in the SDP file are described in 
clause 5.4 of the present document. 

The SDP [6] specification requires certain fields to always be included in an SDP file. Apart from this a PSS server 
shall always include the following fields in the SDP: 

"a=control:" according to clauses C.1.1, C.2 and C.3 in [5]; 

"a=range:" according to clause C.1.5 in [5]; 

"a=rtpmap:" according to clause 6 in [6]; 

"a=fmtp:" according to clause 6 in [6]. 

When an SDP document is generated for media stored in a 3GP file, each control URL defined at the media-level 
"a=control:" field shall include a stream identifier in the last segment of the path component of the URL. The value of 
the stream id shall be defined by the track-ID field in the track header (tkhd) atom associated with the media track. 
When a PSS server receives a set-up request for a stream, it shall use the stream identifier specified in the URL to map 
the request to a media track with a matching track-ID field in the 3GP file. Stream identifiers shall be expressed using 
the following syntax: 

streamldentifier = <stream_id_token>"="<stream_id> 

stream_id_token = l*alpha 

stream_id = I*digit 

The bandwidth field in SDP is needed by the client in order to properly set up QoS parameters. Therefore, a PSS server 
shall include the "b=AS:" field at the media level for each media stream in SDP, and a PSS client shall interpret this 
field. When a PSS client receives SDP, it should ignore the session level "b=AS:" parameter (if present), and instead 
calculate session bandwidth from the media level bandwidth values of the relevant streams. A PSS client shall also 
handle the case where the bandwidth parameter is not present, since this may occur when connecting to a Release-4 
server. 

Note that for RTP based applications , 'b=AS:' gives the RTP "session bandwidth" (including UDP/IP overhead) as 
defined in section 6.2 of [9]. 

The bandwidth for RTCP traffic shall be described using the "RS" and "RR" SDP bandwidth modifiers, as specified by 
[52]. The "RS" SDP bandwidth modifier indicates the RTCP bandwidth allocated to the sender (i.e. PSS server) and 
"RR" indicates the RTCP bandwidth allocated to the receiver (i.e. PSS client). A PSS server shall include the "b=RS:" 
and "b=RR:" fields at the media level for each media stream in SDP, and a PSS client shall interpret them. A PSS client 
shall also handle the case where the bandwidth modifier is not present according to section 3 of [52], since this may 
occur when connecting to a Release-4 server. 

There shall be a limit on the allowed RTCP bandwidth for senders and receivers in a session. This limit is defined as 
follows: 

• 4000 bps for the RS field (at media level); 

• 5000 bps for the RR field (at media level). 

The default value for each of the "RS" and "RR" SDP bandwidth modifiers is 2.5% of the session bandwidth given by 
the "b=AS" parameter. 

In Annex A.2.1 an example SDP in which the limit for the total RTCP bandwidth is 5% of the session bandwidth is 
presented. 

IPv6 addresses in SDP descriptions shall be supported according to RFC 3266[49]. 
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NOTE: The SDP parsers and/or interpreters shall be able to accept NULL values in the 'c=' field (e.g. 0.0.0.0 in IPv4 
case). This may happen when the media content does not have a fixed destination address. For more 
details, see Section C.L7 of [5] and Section 6 of [6]. 

5.3.3.2 Additional SDP fields 

The following Annex G-related media level SDP fields are defined for PSS: 

"a=X-predecbufsize:<size of the hypothetical pre-decoder buffer>" 

This gives the suggested size of the Annex G hypothetical pre-decoder buffer in bytes. 

"a=X-initpredecbufperiod:<initial pre-decoder buffering period>" 

This gives the required initial pre-decoder buffering period specified according to Annex G. Values are 
interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. 
For example, value 180 000 corresponds to a two second initial pre-decoder buffering. 

"a=X-initpostdecbufperiod:<initial post-decoder buffering period>" 

This gives the required initial post-decoder buffering period specified according to Annex G. Values are 

interpreted as clock ticks of a 90-kHz clock. 

"a=X-decbyterate:<peak decoding byte rate>" 

This gives the peak decoding byte rate that was used to verify the compatibility of the stream with Annex G. 

Values are given in bytes per second. 

If none of the attributes "a=X-predecbufsize:", "a=X-initpredecbufperiod:", "a=X-initpostdecbufperiod:", and "a=x- 
decbyterate:" is present, clients should not expect a packet stream according to Annex G. If at least one of the listed 
attributes is present, the transmitted video packet stream shall conform to Annex G. If at least one of the listed attributes 
is present, but some of the listed attributes are missing in an SDP description, clients should expect a default value for 
the missing attributes according to Annex G. 

The following media level SDP field is defined for PSS: 

"a=framesize:<payload type number> <width>-<height>" 
This gives the largest video frame size of H.263 streams. 

The frame size field in SDP is needed by the client in order to properly allocate frame buffer memory. For MPEG-4 
visual streams, the frame size shall be extracted from the "config" information in the SDP. For H.263 streams, a PSS 
server shall include the "a=framesize" field at the media level for each stream in SDP, and a PSS client should interpret 
this field, if present. Clients should be ready to receive SDP descriptions without this attribute. 

If this attribute is present, the frame size parameters shall exactly match the largest frame size defined in the video 
stream. The width and height values shall be expressed in pixels. 

5.4 MIME media types 

For continuous media (speech, audio and video) the following MIME media types shall be used: 
AMR narrow-band speech codec (see clause 7.2) MIME media type as defined in [11]; 
AMR wideband speech codec (see clause 7.2) MIME media type as defined in [11]; 

- MPEG-4 AAC audio codec (see clause 7.3) MIME media type as defined in RFC 3016 [13]. When used in SDP 
the attribute "cpresent" SHALL be set to "0" indicating that the configuration information is only carried out of 
band in the SDP "config" parameter; 

- MPEG-4 video codec (see clause 7.4) MIME media type as defined in RFC 3016 [13]. When used in SDP the 
configuration information shall be carried outband in the "config" SDP parameter and inband (as stated in RFC 
3016). As described in RFC 3016, the configuration information sent inband and the config information in the 
SDP shall be the same except that first_half_vbv_occupancy and latter_half_vbv_occupancy which, if exist, may 
vary in the configuration information sent inband; 

H.263 [22] video codec (see clause 7.4) MIME media type as defined in annex C, clause C.l of the present 
document. 
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MIME media types for JPEG, GIF, PNG, SP-MIDI, SVG, timed text and XHTML can be used both in the "Content- 
type" field in HTTP and in the "type" attribute in SMIL 2.0. The following MIME media types shall be used for these 
media: 

JPEG (see clause 7.5) MIME media type as defined in [15]; 

GIF (see clause 7.6) MIME media type as defined in [15]; 

PNG (see sub clause 7.6) MIME media type as defined in [38]; 

SP-MIDI (see sub clause 7.3A) MIME media type as defined in clause C.2 in Annex C of the present document; 

SVG (see sub clause 7.7) MIME media type as defined in [42]; 

- XHTML (see clause 7.8) MIME media type as defined in [16]; 

Timed text (see subclause 7.9) MIME media type as defined in clause D.9 in Annex D of the present document. 

MIME media type used for SMIL files shall be according to [31] and for SDP files according to [6]. 



6 Data transport 

6.1 Packet based network interface 

PSS clients and servers shall support an IP-based network interface for the transport of session control and media data. 
Control and media data are sent using TCP/IP [8] and UDP/IP [7]. An overview of the protocol stack can be found in 
figure 2 of the present document. 

6.2 RTP over UDP/IP 

The IETF RTP [9] and [10] provides means for sending real-time or streaming data over UDP (see [7]). The encoded 
media is encapsulated in the RTP packets with media specific RTP payload formats. RTP payload formats are defined 
by IETF. RTP also provides a protocol called RTCP (see clause 6 in [9]) for feedback about the transmission quality. 
For the calculation of the RTCP transmission interval Annex A.7 in [9] shall be used. Clause A. 3. 2.3 in Annex A of the 
present document provides more information about the minimum RTCP transmission interval. 

RTP/UDP/IP transport of continuous media (speech , audio and video) shall be supported. 

For RTP/UDP/IP transport of continuous media the following RTP payload formats shall be used: 

AMR narrow-band speech codec (see clause 7.2) RTP payload format according to [1 1]. A PSS client is not 
required to support multi-channel sessions; 

AMR wideband speech codec (see clause 7.2) RTP payload format according to [11]. A PSS client is not 
required to support multi-channel sessions; 

- MPEG-4 AAC audio codec (see clause 7.3) RTP payload format according to RFC 3016 [13]; 

MPEG-4 video codec (see clause 7.4) RTP payload format according to RFC 3016 [13]; 

H.263 video codec (see clause 7.4) RTP payload format according to RFC 2429 [14]. 

NOTE: The payload format RFC 3016 for MPEG-4 AAC specify that the audio streams shall be formatted by the 
LATM (Low-overhead MPEG-4 Audio Transport Multiplex) tool [21]. It should be noted that the 
references for the LATM format in the RFC 3016 [13] point to an older version of the LATM format than 
included in [21]. In [21] a corrigendum to the LATM tool is included. This corrigendum includes changes 
to the LATM format making implementations using the corrigendum incompatible with implementations 
not using it. To avoid future interoperability problems, implementations of PSS client and servers 
supporting AAC shall follow the changes to the LATM format included in [21]. 
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6.3 



HTTP over TCP/IP 



The IETF TCP provides reliable transport of data over IP networks, but with no delay guarantees. It is the preferred way 
for sending the scene description, text, bitmap graphics and still images. There is also need for an application protocol 
to control the transfer. The IETF HTTP [17] provides this functionality. 

HTTP/TCP/IP transport shall be supported for: 

still images (see clause 7.5); 

bitmap graphics (see clause 7.6); 

synthetic audio (see clause 7.3A); 

vector graphics (see clause 7.7); 

text (see clause 7.8); 

timed text (see clause 7.9); 

scene description (see clause 8); 

presentation description (see clause 5.3.3). 



6.4 Transport of RTSP 

Transport of RTSP shall be supported according to RFC 2326 [5]. 



Codecs 



7.1 



General 



For PSS offering a particular media type, media decoders are specified in the following clauses. 



7.2 Speech 



The AMR decoder shall be supported for narrow-band speech [18]. The AMR wideband speech decoder [20] shall be 
supported when wideband speech working at 16 kHz sampling frequency is supported. 



7.3 



Audio 



MPEG-4 AAC Low Complexity (AAC-LC) object type decoder [21] should be supported. The maximum sampling rate 
to be supported by the decoder is 48 kHz. The channel configurations to be supported are mono (1/0) and stereo (2/0). 
In addition, the MPEG-4 AAC Long Term Prediction (AAC-LTP) object type decoder may be supported. 

When a server offers an AAC-LC or AAC-LTP stream with the specified restrictions, it shall include the "profile-level- 
id" and "object" MIME parameters in the SDP "a=fmtp" line. The following values shall be used: 



Object Type 


profile-level-id 


object 


AAC-LC 


15 


2 


AAC-LTP 


15 


4 
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7.3a Synthetic audio 



The Scalable Polyphony MIDI (SP-MIDI) content format defined in Scalable Polyphony MIDI Specification [44] and 
the device requirements defined in Scalable Polyphony MIDI Device 5-to-24 Note Profile for 3GPP [45] should be 
supported. 

SP-MIDI content is delivered in the structure specified in Standard MIDI Files 1.0 [46], either in format or format 1. 

7.4 Video 

ITU-T Recommendation H.263 [22] profile level 10 shall be supported. This is the mandatory video decoder for the 
PSS. In addition, PSS should support: 

- H.263 [23] Profile 3 Level 10 decoder; 

- MPEG-4 Visual Simple Profile Level decoder, [24] and [25]. 
These two video decoders are optional to implement. 

An optional video buffer model is given in Annex G of the present document. 

NOTE: ITU-T Recommendation H.263 profile has been mandated to ensure that video-enabled PSS supports a 
minimum baseline video capability. Both H.263 and MPEG-4 visual decoders can decode an H.263 
profile bitstream. It is strongly recommended, though, that an H.263 profile bitstream is transported 
and stored as H.263 and not as MPEG-4 visual (short header), as MPEG-4 visual is not mandated by PSS. 



7.5 Still images 



ISO/IEC JPEG [26] together with JFIF [27] decoders shall be supported. The support for ISO/IEC JPEG only apply to 
the following two modes: 

baseline DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOFO' in [26]; 

progressive DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOF2' [26]. 

7.6 Bitmap graphics 

The following bitmap graphics decoders should be supported: 

- GIF87a, [32]; 

- GIF89a, [33]; 

- PNG, [38]. 

7.7 Vector graphics 

The SVG Tiny profile [42] [43] shall be supported. In addition SVG Basic profile [42] [43] may be supported. 

7.8 Text 

The text decoder is intended to enable formatted text in a SMIL presentation. A PSS client shall support 

- text formatted according to XHTML Mobile Profile [47]; 

- rendering a SMIL presentation where text is referenced with the SMIL 2.0 "text" element together with the SMIL 
2.0 "src" attribute. 

The following character coding formats shall be supported: 
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- UTF-8, [30]; 

- UCS-2, [29]. 

NOTE: Since both SMIL and XHTML are XML based languages it would be possible to define a SMIL plus 

XHTML profile. In contrast to the present defined PSS4 SMIL Language Profile that only contain SMIL 
modules, such a profile would also contain XHTML modules. No combined SMIL and XHTML profile is 
specified for PSS. Rendering of such documents is out of the scope of the present document. 

7.9 Timed text 

If timed text is supported, PSS clients shall support Annex D, clause D.8a, of this specification. There is no support for 
RTP transport of timed text in this release; 3GP files containing timed text may only be downloaded. 

NOTE: When a PSS client supports timed text it needs to be able to receive and parse 3GP files containing the 
text streams. This does not imply a requirement on PSS clients to be able to render other continuous 
media types contained in 3GP files, e.g. AMR and H.263, if such media types are included in a 
presentation together with timed text. Audio and video are instead streamed to the client using RTSP/RTP 
(see clause 6.2). 



8 Scene description 

8.1 General 

The 3GPP PSS uses a subset of SMIL 2.0 [31] as format of the scene description. PSS clients and servers with support 
for scene descriptions shall support the 3GPP PSS SMIL Language Profile defined in clause 8.2 (abbreviated 3GPP PSS 
SMIL). This profile is a subset of the SMIL 2.0 Language Profile, but a superset of the SMIL 2.0 Basic Language 
Profile. The present document also includes an informative Annex B that provides guidelines for SMIL content authors. 

NOTE: The interpretation of this is not that all streaming sessions are required to use SMIL. For some types of 
sessions, e.g. consisting of one single continuous media or two media synchronised by using RTP 
timestamps, SMIL may not be needed. 

8.2 3GPP PSS SIVIIL Language Profile 
8.2.1 Introduction 

3GPP PSS SMIL is a markup language based on SMIL Basic [31] and SMIL Scalability Framework. 

3GPP PSS SMIL consists of the modules required by SMIL Basic Profile (and SMIL 2.0 Host Language Conformance) 
and additional MediaAccessibility, MediaDescription, MediaClipping, Metalnformation, PrefetchControl, EventTiming 
and BasicTransitions modules. All of the following modules are included: 

SMIL 2.0 Content Control Modules — BasicContentControl, SkipContentControl and PrefetchControl 

- SMIL 2.0 Layout Module - BasicLayout 

SMIL 2.0 Linking Module - BasicLinking, LinkingAttributes 

SMIL 2.0 Media Object Modules - BasicMedia, MediaClipping, MediaAccessibility and MediaDescription 

SMIL 2.0 Metainformation Module — Metalnformation 

- SMIL 2.0 Structure Module -- Structure 

SMIL 2.0 Timing and Synchronization Modules — BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming 

SMIL 2.0 Transition Effects Module — BasicTransitions 
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8.2.2 Document Conformance 

A conforming 3GPP PSS SMIL document shall be a conforming SMIL 2.0 document. 

All 3GPP PSS SMIL documents use SMIL 2.0 namespace. 

<smil xmlns="http: //www. w3 . org/200 1/SMIL20/Language"> 

3GPP PSS SMIL documents may declare requirements using systemRequired attribute: 

EXAMPLE 1: <smil xmlns="http : //www. w3 . org/2001/SMIL20/Language" 

xmlns:EventTiming="http://www.w3.org/2000/SMIL20/CR/EventTiming" 
systemRequired= "EventTiming" > 

Namespace URI http://www.3gpp.org/SMIL20/PSS5/ identifies the version of the 3GPP PSS SMIL profile described in 
the present document. Authors may use this URI to indicate requirement for exact 3GPP PSS SMIL semantics for a 
document or a subpart of a document: 

EXAMPLE 2: <smil xmlns="http : //www. w3 . org/2001/SMIL20/Language" 
xmlns:pss5="http://www.3gpp.org/SMIL20/PSS5/" 
systemRequired="pss5"> 

The content authors should generally not include the PSS requirement in the document unless the SMIL document relies 
on PSS specific semantics that are not part of the W3C SMIL. The reason for this is that SMIL players that are not 
conforming 3GPP PSS user agents may not recognize the PSS URI and thus refuse to play the document. 

8.2.3 User Agent Conformance 

A conforming 3GPP PSS SMIL user agent shall be a conforming SMIL Basic User Agent. 

A conforming user agent shall implement the semantics 3GPP PSS SMIL as described in clauses 8.2.4 and 8.2.5 
(including subclauses). 

A conforming user agent shall recognise 

- the URIs of all included SMIL 2.0 modules; 

the URI http://www.3gpp.org/SMIL20/PSS5/ as referring to all modules and semantics of the version of the 
3GPP PSS SMIL profile described in the present document; 

- the URI http://www.3gpp.org/SMIL20/PSS4/ as referring to all modules and semantics of the 3GPP PSS SMIL 
profile defined in Release 4 of the present document. 

NOTE: The difference between PSS4 and PSS5 is that the BasicTransitions module has been added in PSS5. 

8.2.4 3GPP PSS SMIL Language Profile definition 

3GPP PSS SMIL is based on SMIL 2.0 Basic language profile [31]. This chapter defines the content model and 
integration semantics of the included modules where they differ from those defined by SMIL Basic. 

8.2.4.1 Content Control Modules 

3GPP PSS SMIL includes the content control functionality of the BasicContentControl, SkipContentControl and 
PrefetchControl modules of SMIL 2.0. PrefetchControl is not part of SMIL Basic and is an additional module in this 
profile. 

All BasicContentControl attributes listed in the module specification shall be supported. 

NOTE: The SMIL specification [31] defines that all functionality of PrefetchControl module is optional. This 
mean that even although PrefetchControl is mandatory user agents may implement semantics of 
PrefetchControl module only partially or not to implement them at all. 
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PrefetchControl module adds the prefetch element to the content model of SMIL Basic body, switch, par and seq 
elements. The prefetch element has the attributes defined by the PrefetchControl module (mediaSize, mediaTime and 
bandwidth), the src attribute, the BasicContentControl attributes and the skip-content attribute. 

8.2.4.2 Layout Module 

3GPP PSS SMIL includes the BasicLayout module of SMIL 2.0 for spatial layout. The module is part of SMIL Basic. 
Default values of the width and height attributes for root-layout shall be the dimensions of the device display area. 

8.2.4.3 Linking Module 

3GPP PSS SMIL includes the SMIL 2.0 BasicLinking and LinkingAttributes modules for providing hyperlinks between 
documents and document fragments. The BasicLinking module is from SMIL Basic. 

When linking to destinations outside the current document, implementations may ignore values "play" and "pause" of 
the 'sourcePlaystate' attribute and values "new" and "pause" of the 'show' attribute, instead using the semantics of values 
"stop" and "replace" respectively. When the values of 'sourcePlaystate' and 'show' are ignored the player may also 
ignore the 'sourceLevel' attribute since it is of no use then 

8.2.4.4 Media Object Modules 

3GPP PSS SMIL includes the media elements from the SMIL 2.0 BasicMedia module and attributes from the 
Media Accessibility, MediaDescription and MediaClipping modules. MediaAccessibility, MediaDescription and 
MediaClipping modules are additions in this profile to the SMIL Basic. 

See clause 5.4 for what are the mandatory and optional MIME types a 3GPP PSS SMIL player needs to support. 

MediaClipping module adds to the profile the ability to address sub-clips of continuous media. MediaClipping module 
adds 'clipBegin' and 'clipEnd'(and for compatibility 'clip-begin' and 'clip-end') attributes to all media elements. 

MediaAccessibility module provides basic accessibility support for media elements. New attributes 'alt', 'longdesc' and 
'readlndex' are added to all media elements by this module. MediaDescription module is included by the 
MediaAccessibility module and adds 'abstract', 'author' and 'copyright' attributes to media elements. 

8.2.4.5 Metainformation Module 

The Metainformation module of SMIL 2.0 is included to the profile. This module is addition in this profile to the SMIL 
Basic and provides a way to include descriptive information about the document content into the document. 

This module adds meta and metadata elements to the content model of SMIL Basic head element. 

8.2.4.6 Structure Module 

The Structure module defines the top-level structure of the document. It is included by SMIL Basic. 

8.2.4.7 Timing and Synchronization modules 

The timing modules included in the 3GPP SMIL are BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming. The EventTiming module is an addition in this profile to the SMIL Basic. 

For 'begin' and 'end' attributes either single offset-value or single event-value shall be allowed. Offsets shall not be 
supported with event-values. 

Event timing attributes that reference invalid IDs (for example elements that have been removed by the content control) 
shall be treated as being indefinite. 

Supported event names and semantics shall be as defined by the SMIL 2.0 Language Profile. All user agents shall be 
able to raise the following event types: 

- activateEvent; 
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- beginEvent; 
endEvent. 

The following SMIL 2.0 Language event types should be supported: 
focusInEvent; 
focusOutEvent; 
inBoundsEvent; 
outOfBoundsEvent; 

- repeatEvent. 

User agents shall ignore unknown event types and not treat them as errors. 

Events do not bubble and shall be delivered to the associated media or timed elements only. 

8.2.4.8 Transition Effects Module 

3GPP PSS SMIL profile includes the SMIL 2.0 BasicTransitions module to provide a framework for describing 
transitions between media elements. 

NOTE: The SMIL specification [31] defines that all functionality of BasicTransitions module is optional: 

"Transitions are hints to the presentation. Implementations must be able to ignore transitions if they so 
desire and still play the media of the presentation". This mean that even although the BasicTransitions 
module is mandatory user agents may implement semantics of the BasicTransitions module only partially 
or not to implement them at all. Content authors should use transitions in their SMIL presentation where 
this appears useful. User agents that fully support the semantics of the Basic Transitions module will 
render the presentation with the specified transitions. All other user agents will leave out the transitions 
but present the media content correctly. 

User agents that implement the semantics of this module should implement at least the following transition effects 
described in SMIL 2.0 specification [31]: 

barWipe; 

irisWipe; 

clockWipe; 

snakeWipe; 

pushWipe; 

slide Wipe; 

fade; 

A user agent should implement the default subtype of these transition effects. 

A user agent that implements the semantics of this module shall at least support transition effects for non-animated 
image media elements. For purposes of the Transition Effects modules, two media elements are considered overlapping 
when they occupy the same region. 

BasicTransitions module adds attributes 'transin' and 'transOut' to the media elements of the Media Objects modules, 
and value "transition" to the set of legal values for the Till' attribute of the media elements. It also adds transition 
element to the content model of the head element. 
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8.2.5 Content Model 

This table shows the full content model and attributes of the 3GPP PSS SMIL profile. The attribute collections used are 
defined by SMIL Basic ([31], SMIL Host Language Conformance requirements, chapter 2.4). Changes to SMIL Basic 
are shown in bold. 

Table 1 : Content model for the 3GPP PSS SMIL profile 



Element 




Elements 


Attributes 


smil 


head, body 


COMMON-ATTRS, CONTCTRL-ATTRS, xmlns 


head 


layout, switch, meta, 
metadata, transition 


COMMON-ATTRS 


body 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS 


layout 


root-layout, region 


COMMON-ATTRS, CONTCTRL-ATTRS, type 


root-layout 


EMPTY 


COMMON-ATTRS, backgroundColor, height, width, skip- 
content 


region 


EMPTY 


COMMON-ATTRS, backgroundColor, bottom, fit, height, left, 

right, showBackground, top, width, z-index, skip-content, 

regionName 


ref, animation, audio, img, 
video, text, textstream 


area 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 

repeat, region, MEDIA-ATTRS, clipBegin(clip-begln), 

clipEnd(clip-end), alt, longDesc, readlndex, abstract, 

author, copyright, transin, transOut 


a 


MEDIA-ELMS 


COMMON-ATTRS, LINKING-ATTRS 


area 


EMPTY 


COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS, repeat, 
shape, coords, nohref 


par, seq 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 
repeat 


switch 


TIMING-ELMS, 

MEDIA-ELMS, layout, 

a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS 


prefetch 


EMPTY 


COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize, 
mediaTime, bandwidth, src, skip-content 


meta 


EMPTY 


COMMON-ATTRS, content, name, skip-content 


metadata 


EMPTY 


COMMON-ATTRS, skip-content 


transition 


EMPTY 


COMMON-ATTRS, CONTCTRL-ATTRS, type, subtype, 
startProgress, endProgress, direction, fadeColor. skip- 
content, dur 



3GPP file format (interchange format for IVIIVIS) 



9.1 



General 



The 3GPP file format (3GP) is based on the ISO base media file format [50] and is defined in this specification. It is 
mandated in [35] to be used for continuous media along the entire delivery chain envisaged by the MMS, independent 
on whether the final delivery is done by streaming or download, thus enhancing interoperability. 

In particular, the following stages are considered: 

upload from the originating terminal to the MMS proxy; 

file exchange between MMS servers; 

transfer of the media content to the receiving terminal, either by file download or by streaming. In the first case 
the self-contained file is transferred, whereas in the second case the content is extracted from the file and 
streamed according to open payload formats. In this case, no trace of the file format remains in the content that 
goes on the wire/in the air. 
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Additionally, the 3GPP file format should be used for the storage in the servers and the "hint track" mechanism may be 
used for the preparation for streaming. 

Clause 9.2 of the present document gives the necessary requirements to follow for the 3GPP file format used in MMS. 
These requirements will guarantee PSS to interwork with MMS as well as the 3GPP file format to be used internally 
within the MMS system. For PSS servers not interworking with MMS there is no requirement to follow these 
guidelines. 

9.2 3GPP file format conformance 

The 3GPP file format, used in this specification for timed multimedia (such as video, associated audio and timed text), 
is structurally based on the ISO base media file format defined in [50]. However, the conformance statement for 3GP 
files is defined in the present document by addressing the registration of codecs, file identification, file extension and 
MIME type definition. 

NOTE: Codecs or functionalities not conforming to a 3GP file may be ignored. 

9.2.1 Registration of non-ISO codecs 

MPEG-4 video and AAC audio code streams, as well as the non-ISO code streams AMR narrow-band speech, AMR 
wideband speech, H.263 encoded video and timed text can be included in 3GP files as described in annex D of the 
present document. 

9.2.2 Hint tracks 

Hint tracks are a mechanism that a server implementation may choose to use in preparation for the streaming of media 
content contained in 3GP files. However, it should be observed that the usage of hint tracks is an internal 
implementation matter for the server, and it falls outside the scope of the present document. 

9.2.3 Limitations to the ISO base media file format 

The following limitations to the ISO base media file format [50] shall apply to a 3GP file of this specification: 

there shall be no references to external media outside the file, i.e. a 3GP file shall be self-contained; 

the maximum number of tracks shall be one for video, one for audio and one for text; 

the maximum number of sample entries shall be one per track for video and audio (but unrestricted for text); 

compact sample sizes ('stz2') shall not be used; 

movie fragments shall not be used. 

NOTE: If a file contains video or audio tracks with more than one sample entry per track, a reader may skip those 
tracks or the entire file. 

9.2.4 MPEG-4 systems specific elements 

For the storage of MPEG-4 media specific information in 3GP files, this specification refers to MP4 [51], which is also 
based on the ISO base media file format. However, tracks relative to MPEG-4 system architectural elements (e.g. BIFS 
scene description tracks or OD Object descriptors) are optional in 3GP files and shall be ignored. The inclusion of 
MPEG-4 media does not imply the usage of MPEG-4 systems architecture. The receiving terminal is not required to 
implement any of the specific MPEG-4 system architectural elements. 

9.2.5 Interpretation of 3GPP file format 

All index numbers used in the 3GPP file format start with the value one rather than zero, in particular "first-chunk" in 
Sample to chunk box, "sample-number" in Sync sample box and "shadowed-sample-number", "sync-sample-number" 
in Shadow sync sample box. 
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9.2.6 3GPP file authoring guidelines for progressive download 

The present document specifies the 3GPP file format to be used for distribution of continuous media cHps using MMS. 
The same file format can also be used for file download from any server. However, to achieve a better response to a 
user request for media, it is often advantageous if the client can start playing the media before the full file is 
downloaded. This scenario is known as progressive download and is provided by many proprietary media solutions. It is 
the purpose of this clause to point out that this is also easily achievable by using the 3GPP file format. This possibility 
has been inherent in the file format from the first version in Release 4, and the only thing that is needed is that the 
content creator follows the guidelines provided here. 

The principles behind progressive download are that the session information should be put at the beginning of the file 
and that the media tracks should be interleaved within the file. In practice, this leads to the following guidelines for the 
creation of 3GP files: 

the 'moov' box should be placed at the start of the file, right after the 'ftyp' box; 

the media tracks should be interleaved inside the file. The typical interleaving length is one second. 

For the release-4 file format, the boxes are called atoms but, except for that, everything applies equally well. 

It should be noted that no change is needed at the server side, and that a client that does not support progressive 
download can always play the file once it has been completely downloaded. A progressive download client can start 
playing a 3GP file that has been created along the progressive download guidelines once it has received a first chunk of 
all media in the session. If the file has not been prepared for progressive download, the client will always need to wait 
for the full download. 
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Annex A (informative): 
Protocols 

A.1 SDP 

This clause gives some background information on SDP for PSS clients. 

Table A.l provides an overview of the different SDP fields that can be identified in a SDP file. The order of SDP fields 
is mandated as specified in RFC 2327 [6]. 
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Table A.1 : Overview of fields in SDP for PSS clients 



Type 


Description 


Requirement 
according to [6] 


Requirement 

according to 

the present 

document 


Session Description 


V 


Protocol version 


R 


R 





Owner/creator and session identifier 


R 


R 


S 


Session Name 


R 


R 


1 


Session information 








U 


URI of description 








E 


Email address 








P 


Phone number 








C 


Connection Information 


R 


R 


B 


Bandwidth 
information 


AS 








RS 


ND 





RR 


ND 





One or more Time Descriptions (See below) 


Z 


Time zone adjustments 








K 


Encryption key 








A 


Session attributes 


control 





R 


range 





R 


One or more IVIedia Descriptions (See below) 




Time Description 


T 


Time the session is active 


R 


R 


R 


Repeat times 










IVIedia Description 


M 


Media name and transport address 


R 


R 


1 


IVIedia title 








C 


Connection information 


R 


R 


B 


Bandwidth 
information 


AS 





R 


RS 


ND 


R 


RR 


ND 


R 


K 


Encryption Key 








A 


Attribute Lines 


control 





R 


range 





R 


fmtp 





R 


rtpmap 





R 


X-predecbufsize 


ND 





X-initpredecbufperiod 


ND 





X-initpostdecbufperiod 


ND 





X-decbyterate 


ND 





framesize 


ND 


R (see note 5) 


Note 1 : R = Required, = Optional, ND = Not Defined 

Note 2: The "c" type is only required on the session level if not present on the media level. 

Note 3: The "c" type is only required on the media level if not present on the session level. 

Note 4: According to RFC 2327, either an 'e' or 'p' field must be present in the SDP description. On the 
other hand, both fields will be made optional in the future release of SDP. So, for the sake 
of robustness and maximum interoperability, either an 'e' or 'p' field shall be present during 
the server's SDP file creation, but the client should also be ready to receive SDP content 
containing neither 'e' nor 'p' fields. 

Note 5: The "framesize" attribute is only required for H.263 streams. 



The example below shows an SDP file that could be sent to a PSS client to initiate unicast streaming of a H.263 video 
sequence. 
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EXAMPLE: v=0 

o=ghost 2890844526 2890842807 IN IP4 192.168.10.10 

s=3GPP Unicast SDP Example 

i=Example of Unicast SDP file 

u=http://www.infoserver.com/ae600 

e=ghost@mailserver.com 

c=IN IP4 0.0.0.0 

t=0 

a=range: npt=0-45 .678 

m=video 1024 RTP/AVP 96 

b=AS:128 

a=rtpmap:96 H26 3 -2000/90000 

a=fmtp:96 profile=3;level=10 

a=control:rtsp://mediaserver.com/movie.3gp/trackID=l 

a=framesize:96 176-144 

a=recvonly 



A.2 RTSP 



A.2.1 General 

Clause 5.3.2 of the present document defines the required RTSP support in PSS clients and servers by making 
references to Appendix D of [5]. The current clause gives an overview of the methods (see Table A.2) and headers (see 
Table A. 3) that are specified in the referenced Appendix D. An example of an RTSP session is also given. 

Table A.2: Overview of the required RTSP method support 



Method 


Requirement for a 

minimal on-demand 

playback client 

according to [5]. 


Requirement for a 

PSS client 

according to the 

present document. 


Requirement for a 

minimal on-demand 

playback server 

according to [5]. 


Requirement for a 

PSS server 

according to the 

present document. 


OPTIONS 








Respond 


Respond 


REDIRECT 


Respond 


Respond 








DESCRIBE 





Generate 





Respond 


SETUP 


Generate 


Generate 


Respond 


Respond 


PLAY 


Generate 


Generate 


Respond 


Respond 


PAUSE 


Generate 


Generate 


Respond 


Respond 


TEARDOWN 


Generate 


Generate 


Respond 


Respond 


NOTE 1 : = Support is optional 

NOTE 2: 'Generate' means that the client/server is required to generate the request where applicable. 

NOTE 3: 'Respond' means that the client/server is required to properly respond to the request. 
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Table A.3: Overview of the required RTSP header support 



Header 


Requirement for a 

minimal on-demand 

playback client 

according to [5]. 


Requirement for a 

PSS client 

according to the 

present document. 


Requirement for a 

minimal on-demand 

playback server 

according to [5]. 


Requirement for a 

PSS server 

according to the 

present document. 


Connection 


include/understand 


include/understand 


include/understand 


include/understand 


Content-Encoding 


understand 


understand 


include 


include 


Content-Language 


understand 


understand 


include 


include 


Content-Length 


understand 


understand 


include 


include 


Content-Type 


understand 


understand 


include 


include 


CSeq 


include/understand 


include/understand 


include/understand 


include/understand 


Location 


understand 


understand 








Public 








include 


include 


Range 





include/understand 


understand 


include/understand 


Require 








understand 


understand 


RTP-lnfo 


understand 


understand 


include 


include 


Session 


include 


include 


understand 


understand 


Timestamp 








include/understand 


include/understand 


Transport 


include/understand 


include/understand 


include/understand 


include/understand 


User-Agent"* 














NOTE 1 : = Support is optional 

NOTE 2: 'include' means that the client/server is required to include the header in a request or response where 

applicable. 
NOTE 3: 'understand' means that the client/server is required to be able to respond properly if the header is received in 

a request or response. 
NOTE 4: According to [5] the "User-Agent" header is not strictly required for a minimal RTSP client implementation, 

although it is highly recommended that it is included with requests. The same applies to a PSS client 

according to the present document. 



The example below is intended to give some more understanding of how RTSP and SDP are used within the 3GPP PSS. 
The example assumes that the streaming client has the RTSP URL to a presentation consisting of an H.263 video 
sequence and AMR speech. RTSP messages sent from the client to the server are in bold and messages from the server 
to the client in italic. In the example the server provides aggregate control of the two streams. 



EXAMPLE: 



DESCRIBE rtsp://mediaserver.com/movie.test RTSP/1.0 

CSeq: 1 

User-Agent: TheStreamClient/l.lb2 

RTSP/1.0 200 OK 

CSeq: 1 

Content-Type: application/sdp 

Content-Length: 435 

v=0 

o=- 950814089 950814089 IN IP4 144.132.134.67 

s=Example of aggregate control of AMR speech and H.263 video 

e=foo@bar.com 

c=IN IP4 0.0.0.0 

b=AS:77 

t=0 

a=range:npt=0-59.3478 

a=control:* 

m=audio RTP/AVP 97 

b=AS:13 

b=RR:350 
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b=RS:300 

a=rtpmap:97 AMR/8000 

a=fmtp:97 

a=maxptime:200 

a=control:streamID=0 

m=video RTP/AVP 98 

b=AS:64 

b=RR:2000 

b=RS:1200 

a=rtpmap:98 H263-2000/90000 

a=fmtp:98 profile=3;level=10 

a=control: streamID=l 



SETUP rtsp://mediaserver.coin/movie.test/streamID=0 RTSP/1.0 
CSeq: 2 

Transport: RTF/A VP/UDP;unicast;client_port=3456-3457 
User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 2 

Transport: RTP/AVP/UDP;unicast;client_port=3456-3457; server_port=5678-5679 

Session: dfhyrio90Uk 



SETUP rtsp://mediaserver.coni/movie.test/streamID=l RTSP/1.0 

CSeq: 3 

Transport: RTP/AVP/UDP;unicast;client_port=3458-3459 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 3 

Transport: RTP/AVP/UDP;unicast;client_port=3458-3459; server j)ort=5680-5681 

Session: dfhyrio90llk 



PLAY rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 4 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 4 

Session: dftiyrio90llk 
Range: npt=0- 

RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; seq=9900;rtptime=4470048, 
url= rtsp://mediaserver.com/movie.test/streamID=l; seq=1004;rtptime=1070549 

NOTE: Headers can be folded onto multiple lines if the continuation line begins with a space or 
horizontal tab. For more information, see RFC2616 [17]. 

The user watches the movie for 20 seconds and then decides to fast forward to 10 seconds before 
the end... 

PAUSE rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 5 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 
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PLAY rtsp://mediaserver.coin/movie.test RTSP/1.0 

CSeq: 6 

Range: npt=50-59.3478 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 5 

Session: dfhyrioQOllk 

RTSP/1.0 200 OK 

CSeq: 6 

Session: dftiyrio90llk 

Range: npt=50-59.3478 

RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; 

seq=39900;rtptime=44470648, 

url= rtsp://mediaserver.com/movie.test/streamID=l; 

seq=31004;rtptime=41090349 



After the movie is over the client issues a TEARDOWN to end the session. . . 

TEARDOWN rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 7 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 

RTSP/1.0 200 OK 
Cseq: 7 

Session: dfhyrio90llk 
Connection: close 

A.2.2 Implementation guidelines 
A.2.2.1 Usage of persistent TCP 

Considering the potentially long round-trip-delays in a packet switched streaming service over UMTS it is important to 
keep the number of messages exchanged between a server and a client low. The number of requests and responses 
exchanged is one of the factors that will determine how long it takes from the time that a user initiates PSS until the 
streams starts playing in a client. 

RTSP methods are sent over either TCP or UDP for IP. Both client and server shall support RTSP over TCP whereas 
RTSP over UDP is optional. For TCP the connection can be persistent or non-persistent. A persistent connection is used 
for several RTSP request/response pairs whereas one connection is used per RTSP request/response pair for the non- 
persistent connection. In the non-persistent case each connection will start with the three-way handshake (SYN, ACK, 
SYN) before the RTSP request can be sent. This will increase the time for the message to be sent by one round trip 
delay. 

For these reasons it is recommended that 3GPP PSS clients should use a persistent TCP connection, at least for the 
initial RTSP methods until media starts streaming. 



A.2.2. 2 Detecting link aliveness 



In the wireless environment, connection may be lost due to fading, shadowing, loss of battery power, or turning off the 
terminal even though the PSS session is active. In order for the server to be able to detect the client's aliveness, the PSS 
client should send "wellness" information to the PSS server for a defined interval as described in the RFC2326. There 
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are several ways for detecting link aliveness described in the RFC2326, however, the client should be careful about 
issuing "PLAY method without Range header field" too close to the end of the streams, because it may conflict with 
pipelined PLAY requests. Below is the list of recommended "wellness" information for the PSS clients and servers in a 
prioritised order. 

1. RTCP 

2. OPTIONS method with Session header field 

NOTE: Both servers and clients can initiate this OPTIONS method. 

The client should send the same wellness information in 'Ready' state as in 'Playing' and 'Recording' states, and the 
server should detect the same client's wellness information in 'Ready' state as in 'Playing' and 'Recording' states. In 
particular, the same link aliveness mechanism should be managed following a 'PAUSE' request and response. 



A.3 RTP 
A.3.1 General 

Void. 

A.3. 2 Implementation guidelines 
A.3. 2.1 Maximum RTP packet size 

The RFC 1889 (RTP) [9] does not impose a maximum size on RTP packets. However, when RTP packets are sent over 
the radio link of a 3GPP PSS system there is an advantage in limiting the maximum size of RTP packets. 

Two types of bearers can be envisioned for streaming using either acknowledged mode (AM) or unacknowledged mode 
(UM) RLC. The AM uses retransmissions over the radio link whereas the UM does not. In UM mode large RTP packets 
are more susceptible to losses over the radio link compared to small RTP packets since the loss of a segment may result 
in the loss of the whole packet. On the other hand in AM mode large RTP packets will result in larger delay jitter 
compared to small packets as there is a larger chance that more segments have to be retransmitted. 

For these reasons it is recommended that the maximum size of RTP packets should be limited in size taking into 
account the wireless link. This will decrease the RTP packet loss rate particularly for RLC in UM. For RLC in AM the 
delay jitter will be reduced permitting the client to use a smaller receiving buffer. It should also be noted that too small 
RTP packets could result in too much overhead if IP/UDP/RTP header compression is not applied or unnecessary load 
at the streaming server. 

In the case of transporting video in the payload of RTP packets it may be that a video frame is split into more than one 
RTP packet in order not to produce too large RTP packets. Then, to be able to decode packets following a lost packet in 
the same video frame, it is recommended that synchronisation information be inserted at the start of such RTP packets. 
For H.263 this implies the use of GOBs with non-empty GOB headers and in the case of MPEG-4 video the use of 
video packets (resynchronisation markers). If the optional Slice Structured mode (Annex K) of H.263 is in use, GOBs 
are replaced by slices. 

A.3. 2. 2 Sequence number and timestamp in the presence of NPT jump 

The description below is intended to give more understanding of how RTP sequence number and timestamp are 
specified within the 3GPP PSS in the presence of NPT jumps. The jump happens when a client sends a PLAY request 
to skip media. 

The RFC 2326 (RTSP) [5] specifies that both RTP sequence numbers and RTP timestamps must be continuous and 
monotonic across jumps of NPT. Thus when a server receives a request for a skip of the media that causes a jump of 
NPT, it shall specify RTP sequence numbers and RTP timestamps continuously and monotonically across the skip of 
the media to conform to the RTSP specification. Also, the server may respond with "seq" in the RTP -Info field if this 
parameter is known at the time of issuing the response. 
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A.3.2.3 RTCP transmission interval 

In RTP [9], Section 6.2, rules for the calculation of the interval between the sending of two consecutive RTCP packets, 
i.e. the RTCP transmission interval, are defined. These rules consist of two steps: 

Step 1 : an algorithm that calculates a transmission interval from parameters such as the RTCP bandwidth defined 
in section 5.3.3.1 and the average RTCP packet size. This algorithm is described in [9], annex A. 7. 

Step 2: Taking the maximum of the transmission interval computed in step 1 and a mandatory fixed minimum 
RTCP transmission interval of 5 seconds. 

Implementations conforming to this TS shall perform step 1 and may perform step 2. All other algorithms and rules of 
[9] stay valid and shall be followed. 

Following these recommendations results in regular sending of RTCP messages, where the interval between those is 
depending on the RTCP bandwidth and the average RTCP packet size. 



A.4 Capability exchange 
A.4.1 Overview 

Clause A.4 provides detailed information about the structure and exchange of device capability descriptions for the 
PSS. It complements the normative part contained in clause 5.2 of the present document. 

The functionality is sometimes referred to as capability exchange. Capability exchange in PSS uses the CC/PP [39] 
framework and reuse parts of the CC/PP application UAProf [40]. 

To facilitate server-side content negotiation for streaming, the PSS server needs to have access to a description of the 
specific capabilities of the mobile terminal, i.e. the device capability description. The device capability description 
contains a number of attributes. During the set-up of a streaming session the PSS server can use the description to 
provide the mobile terminal with the correct type of multimedia content. Concretely, it is envisaged that servers use 
information about the capabilities of the mobile terminal to decide which stream(s) to provision to the connecting 
terminal. For instance, the server could compare the requirements on the mobile terminal for multiple available variants 
of a stream with the actual capabilities of the connecting terminal to determine the best-suited stream(s) for that 
particular terminal. A similar mechanism could also be used for other types of content. 

A device capability description contains a number of device capability attributes. In the present document they are 
referred to as just attributes. The current version of PSS does not include a definition of any specific user preference 
attributes. Therefore we use the term device capability description. However, it should be noted that even though no 
specific user preference attributes are included, simple tailoring to the preferences of the user could be achieved by 
temporarily overrides of the available attributes. E.g. if the user for a particular session only would like to receive mono 
sound even though the terminal is capable of stereo, this can be accomplished by providing an override for the 
"AudioChannels" attribute. It should also be noted that the extension mechanism defined would enable an easy 
introduction of specific user preference attributes in the device capability description if needed. 

The term device capability profile or profile is sometimes used instead of device capability description to describe a 
description of device capabilities and/or user preferences. The three terms are used interchangeably in the present 
document. 

Figure A. 1 illustrates how capability exchange in PSS is performed. In the simplest case the mobile terminal informs 
the PSS server(s) about its identity so that the latter can retrieve the correct device capability profile(s) from the device 
profile server(s). For this purpose, the mobile terminal adds one or several URLs to RTSP and/or HTTP protocol data 
units that it sends to the PSS server(s). These URLs point to locations on one or several device profile servers from 
where the PSS server should retrieve the device capability profiles. This list of URLs is encapsulated in RTSP and 
HTTP protocol data units using additional header field(s). The list of URLs is denoted URLdesc. The mobile terminal 
may supplementthe URLdesc with extra attributes or overrides for attributes already defined in the profile(s) located at 
URLdesc. This information is denoted Profdiff. As URLdesc, Profdiff is encapsulated in RTSP and HTTP protocol data 
units using additional header field(s). 

The device profile server in Figure A.l is the logical entity that stores the device capability profiles. The profile needed 
for a certain request from a mobile terminal may be stored on one or several such servers. A terminal manufacturer or a 
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software vendor could maintain a device profile server to provide device capability profiles for its products. It would 
also be possible for an operator to manage a device profile server for its subscribers and then e.g. enable the subscriber 
to make user specific updates to the profiles. The device profile server provides device capability profiles to the PSS 
server on request. 



HTTP/RTSP 
replies and 
multimedia 
content 




Mobile terminal 



HTTP/RTSP request 
including URLdescand 
optional profileDiff 
headers 




PSS server 



Matching 



HTTP request for a 
device capability profile 



HTTP response with 
device capability profile 




Device profile server 



Device capability 
profiles 



Figure A.1 : Functional components in PSS capability exchange 

The PSS server is the logical entity that provides multimedia streams and other, static content (e.g. SMIL documents, 
images, and graphics) to the mobile terminal (see Figure A. 1). A PSS application might involve multiple PSS servers, 
e.g. separate servers for multimedia streams and for static content. A PSS server handles the matching process. 
Matching is a process that takes place in the PSS servers (see Figure A. 1). The device capability profile is compared 
with the content descriptions at the server and the best fit is delivered to the client. 



A.4.2 Scope of the specification 



The following bullet list describes what is considered to be within the scope of the specification for capability exchange 
in PSS. 

Definition of the structure for the device capability profiles, see clause A.4.3. 

Definition of the CC/PP vocabularies, see clause AAA. 

Reference to a set of device capability attributes for multimedia content retrieval applications that have 
already been defined by UAProf [40]. The purpose of this reference is to point out which attributes are useful 
for the PSS application. 

Definition of a set of device capability attributes specifically for PSS appUcations that are missing in UAProf. 



It is important to define an extension mechanism to easily add attributes since it is not possible to cover all 
attributes from the beginning. The extension mechanism is described in clause A.4.5. 

The structure of URLdesc, Profdiff and their interchange is described in clause A.4.6. 
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Protocols for the interchange of device capabiUty profiles between the PSS server and the device profile server is 
defined in clause 5.2.7. 

The specification does not include: 

rules for the matching process on the PSS server. These mechanisms should be left to the implementations. For 
interoperability, only the format of the device capability description and its interchange is relevant. 

definition of specific user preference attributes. It is very difficult to standardise such attributes since they are 
dependent on the type of personalised services one would like to offer the user. The extensible descriptions 
format and exchange mechanism proposed in this document provide the means to create and exchange such 
attributes if needed in the future. However, as explained in clause A.4.1 limited tailoring to the preferences of the 
user could be achieved by temporarily overridingavailable attributes in the vocabularies already defined for PSS. 
The vocabulary also includes some very basic user preference attributes. For example, the profile includes a list 
of preferred languages. Also the list of MIME types can be interpreted as user preference, e.g. leaving out audio 
MIME's could mean that user does not want to receive any audio content. The available attributes are described 
in clause 5.2.3 of the present document. 

requirements for caching of device capability profiles on the PSS server. In UAProf, a content server can cache 
the current device capability profile for a given WSP session. This feature relies on the presence of WSP 
sessions. Caching significantly increases the complexity of both the implementations of the mobile terminal and 
the server. However, HTTP is used between the PSS server and the device profile server. For this exchange, 
normal content caching provisions as defined by HTTP apply and the PSS server may utilise this to speed up the 
session set-up (see clause 5.2.7) 

intermediate proxies. This feature is considered not relevant in the context of PSS applications. 

A.4.3 The device capability profile structure 

A device capability profile is a description of the capabilities of the device and possibly also the preferences of the user 
of that device. It can be used to guide the adaptation of content presented to the device. A device capability profile for 
PSS is a RDF [41] document that follows the structure of the CC/PP framework [39] and the CC/PP application UAProf 
[40]. The terminology of CC/PP is used in this text and therefore briefly described here. 

Attributes are used for specifying the device capabilities and user preferences. A set of attribute names, permissible 
values and semantics constitute a CC/PP vocabulary. A RDF schema defines a vocabulary. The syntax of the attributes 
is defined in the schema but also, to some extent, the semantics. A profile is an instance of a schema and contains one or 
more attributes from the vocabulary. Attributes in a schema are divided into components distinguished by attribute 
characteristics. In the CC/PP specification it is anticipated that different applications will use different vocabularies. 
According to the CC/PP framework a hypothetical profile might look like Figure A.2. A further illustration of how a 
profile might look like is given in the example in clause A.4.7. 
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[MyPhone] 

— ccpp:component- 



-► [Te rmi nal Hardware] 



-rdf:type- 



-prf:ColorCapable 
"prf:BitsPerPixel ~ 



-ccpp:component- 



-> [Streaming] 



-rdf:type- 



-►[prf:HardwarePlatform] 

-►"Yes" 
-►"4" 



>[pss:Streaming] 



-pssPssVersion ►"3GPP-R5" 



Figure A.2: Illustration of the profile structure 

A CC/PP schema is extended through the introduction of new attribute vocabularies and a device capability profile can 
use attributes drawn from an arbitrary number of different vocabularies. Each vocabulary is associated with a unique 
XML namespace. This mechanism makes it possible to reuse attributes from other vocabularies. It should be mentioned 
that the prefix ccpp identifies elements of the CCPP namespace (URI http://www.w3.org/1999/02/22-rdf-syntax-ns), 
prf identifies elements of the UAProf namespace (URI http://www.wapforum.org/profiles/UAPROF/ccppschema- 
20010330) , rdf identifies elements of the RDF namespace (URI http://www.w3.org/1999/02/22-rdf-syntax-ns ) and pss 
identifies elements of the Streaming namespace. (URI http://www.3gpp.org/profiles/PSS/ccppschema-PSS5). 

Attributes of a component can be included directly or may be specified by a reference to a CC/PP default profile. 
Resolving a profile that includes a reference to a default profile is time-consuming. When the PSS server receives the 
profile from a device profile server the final attribute values can not be determined until the default profile has been 
requested and received. Support for defaults is required by the CC/PP specification [39]. Due to these problems, there is 
a recommendation made in clause 5.2.6 to not use the CC/PP defaults element in PSS device capability profile 
documents. 

A.4.4 CC/PP Vocabularies 

A CC/PP vocabulary shall according to CC/PP and UAProf include: 

A RDF schema for the vocabulary based on the CC/PP schema. 

A description of the semantics/type/resolution rules/sample values for each attribute. 

A unique namespace shall be assigned to each version of the profile schema. 
Additional information that could be included in the profile schema: 

A description about the profile schema, i.e. the purpose of the profile, how to use it, when to use it etc. 

A description of extensibility,i.e.how to handle future extensions of the profile schema. 

A device capability profile can use an arbitrary number of vocabularies and thus it is possible to reuse attributes from 
other vocabularies by simply referencing the corresponding namespaces. The focus of the PSS vocabulary is content 
formatting which overlaps the focus of the UAProf vocabulary. UAProf is specified by WAP Forum and is an 
architecture and vocabulary/schema for capability exchange in the WAP environment. Since there are attributes in the 
UAProf vocabulary suitable for streaming applications these are reused and combined with a PSS application specific 
streaming component. This makes the PSS vocabulary an extension vocabulary to UAProf. The CC/PP specification 
encourages reuse of attributes from other vocabularies. To avoid confusion, the same attribute name should not be used 
in different vocabularies. In clause 5.2.3.3 a number of attributes from UAProf [40] are recommended for PSS. The 
PSS base vocabulary is defined in clause 5.2.3.2. 
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A profile is allowed to instantiate a subset of the attributes in the vocabularies and no specific attributes are required but 
insufl'icient description may lead to content unable to be shown by the client. 

A.4.5 Principles of extending a schema/vocabulary 

The use of RDF enables an extensibility mechanism for CC/PP-based schemas that addresses the evolution of new types 
of devices and applications. The PSS profile schema specification is going to provide a base vocabulary but in the 
future new usage scenarios might have need for expressing new attributes. This is the reason why there is a need to 
specify how extensions of the schema will be handled. If the TSG responsible for the present document updates the base 
vocabulary schema a new unique namespace will be assigned to the updated schema. In another scenario the TSG may 
decide to add a new component containing specific user related attributes. This new component will be assigned a new 
namespace and it will not influence the base vocabulary in any way. If other organisations or companies make 
extensions this can be either as a new component or as attributes added to the existing base vocabulary component 
where the new attributes uses a new namespace. This ensures that third parties can define and maintain their own 
vocabularies independently from the PSS base vocabulary. 

A.4.6 Signalling of profile information between client and server 

URLdesc and Profdiff were introduced in clause A.4. 1. The URLdesc is a list of URLs that point to locations on device 
profile servers from where the PSS server retrieves suitable device capability profiles. The Profdiff contains additional 
capability description information; e.g. overrides for certain attribute values. Both URLdesc and Profdiff are 
encapsulated in RTSP and HTTP messages using additional header fields. This can be seen in Figure A.L In clause 9.1 
of [40] three new HTTP headers are defined that can be used to implement the desired functionality: "x-wap-profile", 
"x-wap-profile-diff" and "x-wap-profile- warning". These headers are reused in PSS for both HTTP and RTSP. 

The "x-wap-profile" is a request header that contains a list of absolute URLs to device capability descriptions 
and profile diff names. The profile diff names correspond to additional profile information in the "x-wap-profile- 
diff header. 

The "x-wap-profile-diff" is a request header that contains a subset of a device capability profile. 

The "x-wap-profile-warning" is a response header that contains error codes explaining to what extent the server 
has been able to match the terminal request. 

Clause 5.2.5 of the present document defines this exchange mechanism. 

It is left to the mobile terminal to decide when to send x-wap-profile headers. The mobile terminal could send the "x- 
wap-profile" and "x-wap-profile-diff headers with each RTSP DESCRIBE and/or with each RTSP SETUP request. 
Sending them in the RTSP DESCRIBE request is useful for the PSS server to be able to make a better decision which 
presentation description to provision to the client. Sending the "x-wap-profile" and "x-wap-profile-diff" headers with an 
HTTP request is useful whenever the mobile terminal requests some multimedia content that will be used in the PSS 
application. For example it can be sent with the request for a SMIL file and the PSS server can see to it that the mobile 
terminal receives a SMIL file which is optimised for the particular terminal. Clause 5.2.5 of the present document gives 
recommendations for when profile information should be sent. 

It is up to the PSS server to retrieve the device capability profiles using the URLs in the "x-wap-profile" header. The 
PSS server is also responsible to merge the profiles then received. If the "x-wap-profile-diff" header is present it must 
also merge that information with the retrieved profiles. This functionality is defined in clause 5.2.6. 

It should be noted that it is up the implementation of the mobile terminal what URLs to send in the "x-wap-profile" 
header. For instance, a terminal could just send one URL that points to a complete description of its capabilities. 
Another terminal might provide one URL that points to a description of the terminal hardware. A second URL that 
points to a description of a particular software version of the streaming application, and a third URL that points to the 
description of a hardware or software plug-in that is currently added to the standard configuration of that terminal. From 
this example it becomes clear that sending URLs from the mobile terminal to the server is good enough not only for 
static profiles but that it can also handle re-configurations of the mobile terminal such as software version changes, 
software plug-ins, hardware upgrades, etc. 

As described above the list of URLs in the x-wap-profile header is a powerful tool to handle dynamic changes of the 
mobile terminal. The "x-wap-profile-diff" header could also be used to facilitate the same functionality. To use the "x- 
wap-profile-diff header to e.g. send a complete profile (no URL present at all in the "x-wap-profile header") or updates 
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as a result of e.g. a hardware plug-in is not recommended unless some compression scheme is applied over the air- 
interface. The reason is of course that the size of a profile may be large. 

A.4.7 Example of a PSS device capability description 

The following is an example of a device capability profile as it could be available from a device profile server. The 
XML document includes the description of the imaginary "PhoneOO?" phone. 

Instead of a single XML document the description could also be spread over several files. The PSS server would need to 
retrieve these profiles separately in this case and would need to merge them. For instance, this would be useful when 
device capabilities of this phone that are related to streaming would differ among different versions of the phone. In this 
case the part of the profile for streaming would be separated from the rest into its own profile document. This separation 
allows describing the difference in streaming capabilities by providing multiple versions of the profile document for the 
streaming capabilities. 

<?xml version=" 1 . " ?> 

<rdf :RDF xmlns : rdf ="http : //www. w3 . org/1999/02/22-rdf-syntax-ns " 
xmlns ; ccpp="http ; //www. w3 . org/2000/07/04-ccpp" 

xmlns ; prf ="http ; //www. wapf orum. org/prof lies /UAPROF/ccppschema-200 10330 " 
xmlns ;pss5="http; //www. 3gpp . org/prof lies /PSS/ ccppschema-PSS5 "> 

<rdf; Description rdf; about ="http ; //www. bar . com/Phones /PhoneOOV "> 

<ccpp ; component> 

<rdf; Description ID="HardwarePlatf orm"> 

<rdf : type rdf; resource="http: //www. wapf orum. org/prof lies/ UAPROF/ccpps chema- 
20010330#HardwarePlatform" /> 

<prf:BitsPerPixel>4</prf:BitsPerPixel> 
<prf : ColorCapable>Yes</prf : ColorCapable> 
<prf:PixelAspect Rat io>lx2</prf:PixelAspect Ratio 
<prf ; PointingResolution>Pixel</prf ; PointingResolution> 

<prf :Model>Phone007</prf :Model> 
<prf : Vendor >Ericsson< /prf ; Vendor > 
</rdf :Description> 
</ccpp: component> 

<ccpp : component> 

<rdf: Description ID= "Soft war ePlatform"> 

<rdf : type rdf: resource="http: //www. wapf orum. org/prof iles/UAP ROF/ccpps chema- 
20010330#SoftwarePlatform" /> 

<prf : CcppAccept-Charset> 
<rdf :Bag> 

<rdf :li>UTF-8</rdf :li> 
<rdf :li>ISO-10 64 6-UCS-2</rdf :li> 
</rdf :Bag> 
</prf : CcppAccept-Charset> 
<prf : CcppAccept-Encoding> 
<rdf :Bag> 

<rdf : li>base64</rdf : li> 
<rdf ; li>quoted-printable</rdf ; li> 
</rdf :Bag> 
</prf : CcppAccept-Encoding> 
<prf : CcppAccept-Language> 
<rdf : Seq> 

<rdf : li>en</rdf : li> 
<rdf : li>se</rdf : li> 

</rdf : Seq> 
</prf : CcppAccept-Language> 
</rdf;Description> 
</ccpp: component > 

<ccpp : component> 

<rdf: Description ID=" St reaming "> 

<rdf:type rdf : resource=" http://www.3gpp.Org/profiles/PSS/ccppschema-PSS5#Streaming" /> 
<pss5 : AudioChannels>Stereo</pss5 : AudioChannels> 

<pss5 : VideoPreDecoderBuf ferSize>30720</pss5 : VideoPreDecoderBuf ferSize> 

<pss5 : VideoInitialPostDecoderBuf feringPeriod>0</pss5 : VideoInitialPostDecoderBuf feringPeriod> 
<pss5 : VideoDecodingByteRate>16000</pss5 : VideoDecodingByteRate> 
<pss5 :RenderingScreenSize>7 3x50</pss5 : Render ingScreenSize> 
<pss5 :PssAccept> 
<rdf :Bag> 

<rdf : li>audio/AMR-WB; octet-alignment</rdf : li> 
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<rdf : li>video/MP4V-ES</rdf : li> 
</rdf :Bag> 
</pss5 :PssAccept> 
<pss5 :PssAccept-Subset> 
<rdf :Bag> 

<rdf :li>JPEG-PSS</rdf :li> 
</rdf :Bag> 
</pss5 ;PssAccept-Subset> 

<pss5 :PssVersion>3GPP-R5</pss5 :PssVersion> 

<pss5 : RenderingScreenSize>7 0x4 0</pss5 ; Render ingScreenSize> 
<pss5 : SmilBaseSet>SMIL-3GPP-R4</pss5 : SmilBaseSet> 
<pss5 : SmilModules> 
<rdf :Bag> 

<rdf: li>BasicTransitions</rdf; li> 
<rdf : li>MulitArcTiming</rdf : li> 
</rdf :Bag> 
</pss5 : SmilModules> 
</rdf;Description> 
</ccpp: component > 

</rdf:Description> 
</rdf :RDF> 
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Annex B (informative): 
SMIL authoring guidelines 

B.1 General 

This is an informative annex for SMIL presentation authors. Authors can expect that PSS cHents can handle the SMIL 
module collection defined in clause 8.2, with the restrictions defined in this Annex. When creating SMIL documents the 
author is recommended to consider that terminals may have small displays and simple input devices. The media types 
and their encoding included in the presentation should be restricted to what is described in clause 7 of the present 
document. Considering that many mobile devices may have limited software and hardware capabilities, the number of 
media to be played simultaneous should be limited. For example, many devices will not be able to handle more than one 
video sequence at the time. 



B.2 BasicLinking 



The Linking Modules define elements and attributes for navigational hyperlinking, either through user interaction or 
through temporal events. The BasicLinking module defines the "a" and "area" elements for basic linking: 

a Similar to the "a" element in HTML it provides a link from a media object through the href attribute (which 

contains the URI of the link's destination). The "a" element includes a number of attributes for defining the 
behaviour of the presentation when the link is followed. 

area Whereas the a element only allows a link to be associated with a complete media object, the area element 
allows links to be associated with spatial and/or temporal portions of a media object. 

The area element may be useful for enabling services that rely on interactivity where the display size is not big enough 
to allow the display of links alongside a media (e.g. QCIF video) window. Instead, the user could, for example, click on 
a watermark logo displayed in the video window to visit the company website. 

Even if the area element may be useful some mobile terminals will not be able to handle area elements that include 
multiple selectable regions within an area element. One reason for this could be that the terminals do not have the 
appropriate user interface. Such area elements should therefore be avoided. Instead it is recommended that the "a" 
element be used. If the "area" element is used, the SMIL presentation should also include alternative links to navigate 
through the presentation; i.e. the author should not create presentations that rely on that the player can handle "area" 
elements. 



B.3 BasicLayout 



When defining the layout of a SMIL presentation, a content author needs to be aware that the targeted devices might 
have diverse properties that effect how the content can be rendered. The different sizes of the display area that can be 
used to render content on the targeted devices should be considered for defining the layout of the SMIL presentation. 
The root-layout window might represent the entire display or only parts of it. 

Content authors are encouraged to create SMIL presentations that will work well with different resolutions of the 
rendering area. As mentioned in the SMIL2 recommendation content authors should use SMIL ContentControl 
functionality for defining multiple layouts for their SMIL presentation that are tailored to the specific needs of the 
whole range of targeted devices. Furthermore, authors should include a default layout (i.e. a layout determined by the 
SMIL player) that will be used when none of the author-defined layouts can be used. 

Using relative position and size attributes in the definition of a region is also helpful for making SMIL presentations 
more portable across different display sizes; these features should also be used. 

A 3GPP SMIL player should use the layout definition of a SMIL presentation for presenting the content whenever 
possible. When the SMIL player fails to use the layout information defined by the author it is free to present the content 
using a layout it determines by itself. 
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The "fit" attribute defines how difl'erent media should be fitted into their respective display regions. 

The rendering and layout of some objects on a small display might be difficult and all mobile devices may not support 
features such as scroll bars. Therefore "fit=scroir' should not be used except for text content. 

Due to hardware restrictions in mobile devices, operations such that scaling of a video sequence, or even images, may 
be very difficult to achieve. According to the SMIL 2.0 specification SMIL players may in these situations clip the 
content instead. To be sure of that the presentation is displayed as the author intended, video content should be encoded 
in a size suitable for the targeted terminals and it is recommended to use "fit=hidden". 



B.4 EventTiming 



The two attributes "endEvent" and "repeatEvent" in the EventTiming module may cause problems for a mobile SMIL 
player. The end of a media element triggers the "endEvent". In the same way the "repeatEvent" occurs when the second 
and subsequent iterations of a repeated element begin playback. Both these events rely on that the SMIL player receives 
information about that the media element has ended. One example could be when the end of a video sequence initiates 
the event. If the player has not received explicit information about the duration of the video sequence, e.g. by the "dur" 
attribute in SMIL or by some external source as the "a=range" field in SDP. The player will have to rely on the RTCP 
BYE message to decide when the video sequence ends. If the RTCP BYE message is lost, the player will have problems 
initiate the event. For these reasons is recommended that the "endEvent" and "repeatEvent" attributes are used with 
care, and if used the player should be provided with some additional information about the duration of the media 
element that triggers the event. This additional information could e.g. be the "dur" attribute in SMIL or the "a=range" 
field in SDP. 

The "inBoundsEvent" and "outOfBoundsEvent" attributes assume that the terminal has a pointer device for moving the 
focus to within a window (i.e. clicking within a window). Not all terminals will support this functionality since they do 
not have the appropriate user interface. Hence care should be taken in using these particular event triggers. 



B.5 Metal nformation 



Authors are encouraged to make use of meta data whenever providing such information to the mobile terminal appears 
to be useful. However, they should keep in mind that some mobile terminals will parse but not process the meta data. 

Furthermore, authors should keep in mind that excessive use of meta data will substantially increase the file size of the 
SMIL presentation that needs to be transferred to the mobile terminal. This may result in longer set-up times. 



B.6 XML entities 



Entities are a mechanism to insert XML fragments inside an XML document. Entities can be internal, essentially a 
macro expansion, or external. Use of XML entities in SMIL presentations is not recommended, as many current XML 
parsers do not fully support them. 

BJ XHTML Mobile Profile 

When rendering texts in a SMIL presentation, authors are able to use XHTML Mobile Profile [47] that contains thirteen 
modules. However, some of the modules include non-text information. When referring to an XHTML Mobile Profile 
document from a SMIL document, authors should use only the required XHTML Host Language modules : Structure 
Module, Text Module, Hypertext Module and List Module. The use of the Image Module, in particular, should not be 
used. Images and other non-text contents should be included in the SMIL document. 

NOTE: An XHTML file including a module which is not part of the XHTML Host Language modules may not 
be shown as intended. Also, an XHTML file which uses elements or attributes from the required 
XHTML Host Language modules and which uses elements or attributes that are not included in XHTML 
Basic Profile [28], may not render correctly on legacy handsets which implement only XHTML Basic. 
These are: 

The start attribute on the 'ol' element in the List module 
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The value attribute on the 'li' element in the List module 

The 'b' element in the Presentation module 

The 'big' element in the Presentation module 

The 'hr' element in the Presentation module 

The 'i' element in the Presentation module 

The 'small' element in the Presentation module 
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Annex C (normative): 
MIME media types 

C.1 MIME media type H263-2000 

MIME media type name: video 
MIME subtype name: H263-2000 

Required parameters: None 

Optional parameters: 

profile: H.263 profile number, in the range through 8, specifying the supported H.263 annexe s/subp arts. 

level: Level of bitstream operation, in the range through 99, specifying the level of computational complexity of the 

decoding process. When no profile and level parameters are specified. Baseline Profile (Profile 0) level 10 are the 

default values. 

The profile and level specifications can be found in [23]. Note that the RTP payload format for H263-2000 is the same 
as for H263-1998 and is defined in [14], but additional annexes/subparts are specified along with the profiles and levels. 

NOTE: The above text will be replaced with a reference to the RFC describing the H263-2000 MIME media type 
as soon as this becomes available. 



C.2 MIME media type sp-midi 



MIME media type name: audio 
MIME subtype name: sp-midi 

Required parameters: none 

Optional parameters: none 

NOTE: The above text will be replaced with a reference to the RFC describing the sp-midi MIME media type as 
soon as this becomes available. 
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Annex D (normative): 

3GP files - codecs and identification 

D.1 General 

The purpose of this annex is to define the necessary structure for integration of the H.263, MPEG-4 video, AMR, AMR- 
WB, AAC and timed text media specific information in a 3GP file. Clause D.2 gives some background information 
about the Sample Description box in the ISO base media file format [50] and clauses D.3 and D.4 about the 
MP4VisualSampleEntry box and the MP4AudioSampleEntry box in the MPEG-4 file format [51]. The definitions of 
the SampleEntry boxes for AMR, AMR-WB and H.263 are given in clauses D.5 to D.8. The SampleEntry box for timed 
text is given in clause D.8a. Finally, the identification of 3GP files is described in clause D.9. 

AMR and AMR-WB data is stored in the stream according to the AMR and AMR-WB storage format for single 
channel header of Annex E [11], without the AMR magic numbers. 



D.2 Sample Description box 



In an ISO file. Sample Description Box gives detailed information about the coding type used, and any initialisation 
information needed for that coding. The Sample Description Box can be found in the ISO file format Box Structure 
Hierarchy shown in figure D.l . 



Movie Box 



Track Box 



Media Box 



Media Information Box 



Sample Table Box 



Sample Description Box 



Figure D.1 : ISO File Format Box Structure Hierarchy 
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The Sample Description Box can have one or more Sample Entries. Valid Sample Entries already defined for ISO and 
MP4 include MP4AudioSampleEntry, MP4VisualSampleEntry and HintSampleEntry. The Sample Entries for AMR 
and AMR-WB shall be AMRSampleEntry, for H.263 it shall be H263SampleEntry, and for timed text it shall be 
TextSampleEntry. 

The format of SampleEntry and its fields are explained as follows: 

SampleEntry ::= MP4VisualSampleEntry I 

MP4AudioSampleEntry I 

HintSampleEntry I 

TextSampleEntry I 

H263SampleEntry I 

AMRSampleEntry 

Table D.1 : SampleEntry fields 



Field 


Type 


Details 


Value 


MP4VisualSampleEntry 




Entry type for visual samples defined 
in the MP4 specification. 




MP4AudioSampleEntry 




Entry type for audio samples defined 
in the MP4 specification. 




HintSampleEntry 




Entry type for hint tracl< samples 
defined in the ISO specification. 




TextSampleEntry 




Entry type for timed text samples 
defined in clause D8a.16of the 
present document. 




H263SampleEntry 




Entry type for H.263 visual samples 
defined in clause D.6 of the present 
document. 




AMRSampleEntry 




Entry type for AI\/IR and AMR-WB 
speech samples defined in clause D.5 
of the present document. 





From the above 6 Sample Entries, only the MP4VisualSampleEntry, MP4AudioSampleEntry, TextSampleEntry, 
H26 3 SampleEntry and AMRSampleEntry are taken into consideration, since hint tracks are out of the scope of the 
present document. 



D.3 MP4VisualSampleEntry box 

The MP4VisualSampleEntry Box is defined as follows: 
MP4VisualSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 
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Reserved_2 

Reserved_32 

Reserved_2 

Reserved_2 

ESDBox 



Table D.2: MP4VisualSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned 
int(32) 






BoxHeader.Type 


Unsigned 
int(32) 




'mp4v' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference boxes. 




Reserved_1 6 


Const 
unsigned 
int(32) [4] 







Width 


Unsigned 
int(16) 


IVIaximum width, in pixels of the 
stream 




Height 


Unsigned 
int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned 
int(8) [32] 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


ESDBox 




Box containing an elementary stream 
descriptor for this stream. 





The stream type specific information is in the ESDBox structure, as defined in [51]. 

This version of the MP4VisualSampleEntry, with exphcit width and height, shall be used for MPEG-4 video streams 
conformant to this specification. 

NOTE: width and height parameters together may be used to allocate the necessary memory in the playback 
device without need to analyse the video stream. 



D.4 MP4AudioSampleEntry box 

MP4AudioSampleEntryBox is defined as follows: 
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MP4AudioSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

ESDBox 



Table D.3: MP4AudioSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned 
int(32) 






BoxHeader.Type 


Unsigned 
int(32) 




'mp4a' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference boxes. 




Reserved_8 


Const 
unsigned 
int(32) [2] 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(16) 


Copied from track 




Reserved_2 


Const 

unsigned 

int(16) 







ESDBox 




Box containing an elementary stream 
descriptor for this stream. 





The stream type specific information is in the ESDBox structure, as defined in [51]. 



D.5 AMRSampleEntry box 

For narrow-band AMR, the box type of the AMRSampleEntry Box shall be 'samr'. For AMR wideband (AMR-WB), 
the box type of the AMRSampleEntry Box shall be 'sawb'. 

The AMRSampleEntry Box is defined as follows: 
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AMRSampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

AMRSpecificBox 

Table D.4: AMRSampleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned 
int(32) 






BoxHeader.Type 


Unsigned 
int(32) 




'samr' or 'sawb' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference boxes. 




Reserved_8 


Const 
unsigned 
int(32) [2] 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(16) 


Copied from media header box of this 
media 




Reserved_2 


Const 

unsigned 

int(16) 







AlVIRSpecificBox 




Information specific to the decoder. 





If one compares the MP4AudioSampleEntry Box - AMRSampleEntry Box the main difference is in the replacement of 
the ESDBox, which is specific to MPEG-4 systems, with a box suitable for AMR and AMR-WB. The 
AMRSpecificBox field structure is described in clause D.7. 



D.6 H263SampleEntry box 

The box type of the H263SampleEntry Box shall be 's263'. 
The H263SampleEntry Box is defined as follows: 



£75/ 



3GPP TS 26.234 version 5.5.0 Release 5 



57 



ETSI TS 126 234 V5.5.0 (2003-06) 



H263SampleEntry ::= BoxHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_2 

Reserved_32 

Reserved_2 

Reserved_2 

H263SpecificBox 
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Table D.5: H263SanipleEntry fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned 
int(32) 






BoxHeader.Type 


Unsigned 
int(32) 




's263' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference boxes. 




Reserved_1 6 


Const 
unsigned 
int(32) [4] 







Width 


Unsigned 
int(16) 


IVIaximum width, in pixels of the 
stream 




Height 


Unsigned 
int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned 
int(8) [32] 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


H263SpecificBox 




Information specific to the H.263 
decoder. 





If one compares the MP4VisualSampleEntry - H263SampleEntry Box the main difference is in the replacement of the 
ESDBox, which is specific to MPEG-4 systems, with a box suitable for H.263. The H263SpecificBox field structure for 
H.263 is described in clause D.8. 



D.7 AMRSpecificBox field for AM RSample Entry box 

The AMRSpecificBox fields for AMR and AMR-WB shall be as defined in table D.6. The AMRSpecificBox for the 
AMRSampleEntry Box shall always be included if the 3GP file contains AMR or AMR-WB media. 

Table D.6: The AMRSpecificBox fields for AMRSampleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'damr' 


DecSpecificlnfo 


AMRDecSpecStruc 


Structure which holds the AMR 
and AMR-WB Specific 
information 





BoxHeader Size and Type: indicate the size and type of the AMR decoder-specific box. The type must be 'damr'. 
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DecSpecificInfo: the structure where the AMR and AMR-WB stream specific information resides. 
The AMRDecSpecStruc is defined as follows: 
struct AMRDecSpecStruc { 

Unsigned int (32) vendor 

Unsigned int (8) decoder_version 

Unsigned int (16) mode_set 

Unsigned int (8) mode_change_period 

Unsigned int (8) frames_per_sample 

} 

The definitions of AMRDecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding 
end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, 
it is recommended that the manufacturer creates a four character code which best addresses the manufacturer's name. It 
can be safely ignored. 

decoder_version: version of the vendor's decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to one mode. The bit index of the 
mode is calculated according to the 4 bit FT field of the AMR or AMR-WB frame structure. The mode_set bit structure 
is as follows: (B15xxxxxxB8B7xxxxxxB0) where BO (Least Significant Bit) corresponds to Mode 0, and B8 
corresponds to Mode 8. 

The mapping of existing AMR modes to FT is given in table 1. a in [19]. A value of 0x8 IFF means all modes and 
comfort noise frames are possibly present in an AMR stream. 

The mapping of existing AMR-WB modes to FT is given in Table l.a in TS 26.201 [37]. A value of Ox83FF means all 
modes and comfort noise frames are possibly present in an AMR-WB stream. 

As an example, if mode_set = 00000001 10010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream. 

niode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no 
restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it 
according to the frames_per_sample field: 

if (mode_change_period < frames_per_sample) 

frames_per_sample = k x (mode_change_period) 
else if (mode_change_period > frames_per_sample) 

mode_change_period = kx (frames_per_sample) 

where k : integer [2, ...] 

If mode_change_period is equal to frames_per_sample, then the mode is the same for all frames inside one sample. 

frames_per_sample: defines the number of frames to be considered as 'one sample' inside the 3GP file. This number 
shall be greater than and less than 16. A value of 1 means each frame is treated as one sample. A value of 10 means 
that 10 frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, 
one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the stream, the number of 
frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frame s_per_s ample. 

NOTEl: The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStruc 
members. 
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NOTE2: The following AMR MIME parameters are not relevant to PSS: {mode_set, mode_change_period, 

mode_change_neighbor}. PSS servers should not send these parameters in SDP, and PSS clients shall 
ignore these parameters if received. 



D.8 H263SpecificBox field for H263SampleEntry box 

The H263SpecificBox fields for H. 263 shall be as defined in table D.7. The H263SpecificBox for the 
H263SampleEntry Box shall always be included if the 3GP file contains H.263 media. 

The H263SpecificBox for H263 is composed of the following fields. 

Table D.7: The H263SpecificBox fields H263SanipleEntry 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'd263' 


DecSpecificlnfo 


H263DecSpecStruc 


Structure wliich holds the 
H.263 Specific information 




BitrateBox 




Specific bitrate information 
(optional) 





BoxHeader Size and Type: indicate the size and type of the H.263 decoder-specific box. The type must be 'd263' 
DecSpecificlnfo: This is the structure where the H263 stream specific information resides. 
H263DecSpecStruc is defined as follows: 



struct H263DecSpecStruc{ 



Unsigned int (32) 
Unsigned int (8) 
Unsigned int (8) 
Unsigned int (8) 



vendor 

decoder_version 
H263_Level 
H263 Profile 



} 



The definitions of H263DecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding 
end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, 
it is recommended that the manufacturer creates a four character code which best addresses the manufacturer's name. It 
can be safely ignored. 

decoder_version: version of the vendor's decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. . The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

H263_Level and H263_Profile: These two parameters define which H263 profile and level is used. These parameters 
are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [23]. 

EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0} 

EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3 } 

NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc 

members. 

The BitrateBox field shall be as defined in table D.7. 1 . The BitrateBox may be included if the 3GP file contains H.263 
media. 
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The BitrateBox is composed of the following fields. 



Table D.7.1 : The BitrateBox fields 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned int(32) 






BoxHeader.Type 


Unsigned int(32) 




'bitr' 


DecBitratelnfo 


DecBitrStruc 


Structure wliich holds the 
Bitrate information 





BoxHeader Size and Type: indicate the size and type of the bitrate box. The type must be 'bitr'. 
DecBitratelnfo: This is the structure where the stream bitrate information resides. 
DecBitrStruc is defined as follows: 
struct DecBitrStruc! 

Unsigned int (32) Avg_Bitrate 

Unsigned int (32) Max_Bitrate 

} 

The definitions of DecBitrStruc members are as follows: 

Avg_Bitrate: the average bitrate in bits per second of this elementary stream. For streams with variable bitrate this 
value shall be set to zero. 

Max_Bitrate: the maximum bitrate in bits per second of this elementary stream in any time window of one second 
duration. 



D.8a Timed Text Format 



This clause defines the format of timed text in downloaded files. In this release, timed text is downloaded, not 
streamed. 

Operators may specify additional rules and restrictions when deploying terminals, in addition to this specification, and 
behavior that is optional here may be mandatory for particular deployments. In particular, the required character set is 
almost certainly dependent on the geography of the deployment. 



D.8a.1 Unicode Support 



Text in this specification uses the Unicode 3.0 [30] standard. Terminals shall correctly decode both UTF-8 and UTF-16 
into the required characters. If a terminal receives a Unicode code, which it cannot display, it shall display a predictable 
result. It shall not treat multi-byte UTF-8 characters as a series of ASCII characters, for example. 

Authors should create fully-composed Unicode; terminals are not required to handle decomposed sequences for which 
there is a fully-composed equivalent. 

Terminals shall conform to the conformance statement in Unicode 3.0 section 3.1. 

Text strings for display and font names are uniformly coded in UTF-8, or start with a UTF-16 BYTE ORDER MARK 
(\uFEFF) and by that indicate that the string which starts with the byte order mark is in UTF-16. Terminals shall 
recognise the byte-order mark in this byte order; they are not required to recognise byte -re versed UTF-16, indicated by 
a byte-reversed byte-order mark. 
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D.8a.2 Bytes, Characters, and Glyphs 



This clause uses these terms carefully. Since multi-byte characters are permitted (i.e. 16-bit Unicode characters), the 
number of characters in a string may not be the number of bytes. Also, a byte-order-mark is not a character at all, 
though it occupies two bytes. So, for example, storage lengths are specified as byte-counts, whereas highlighting is 
specified using character offsets. 

It should also be noted that in some writing systems the number of glyphs rendered might be different again. For 
example, in English, the characters 'fi' are sometimes rendered as a single ligature glyph. 

In this specification, the first character is at offset in the string. In records specifying both a start and end offset, the 
end offset shall be greater than or equal to the start offset. In cases where several offset specifications occur in 
sequence, the start offset of an element shall be greater than or equal to the end offset of the preceding element. 

D.8a.3 Character Set Support 

All terminals shall be able to render Unicode characters in these ranges: 

a) basic ASCII and Latin- 1 (\uOOOO to \uOOFF), though not all the control characters in this range are needed; 

b) the Euro currency symbol (\u20AC) 

c) telephone and ballot symbols (\u260E through \u2612) 
Support for the following characters is recommended but not required: 

a) miscellaneous technical symbols (\u2300 through \u2335) 

b) 'Zapf Dingbats': locations \u2700 through \u27AF, and the locations where some symbols have been relocated 
(e.g. \u2605. Black star). 

The private use characters \u0091 and \u0092, and the initial range of the private use area \uEOOO through \uEOFF are 
reserved in this specification. For these Unicode values, and for control characters for which there is no defined 
graphical behaviour, the terminal shall not display any result: neither a glyph is shown nor is the current rendering 
position changed. 



D.8a.4 Font Support 



Fonts are specified in this specification by name, size, and style. There are three special names which shall be 
recognized by the terminal: Serif, Sans-Serif, and Monospace. It is strongly recommended that these be different fonts 
for the required characters from ASCII and Latin- 1. For many other characters, the terminal may have a limited set or 
only a single font. Terminals requested to render a character where the selected font does not support that character 
should substitute a suitable font. This ensures that languages with only one font (e.g. Asian languages) or symbols for 
which there is only one form are rendered. 

Fonts are requested by name, in an ordered list. Authors should normally specify one of the special names last in the 
list. 

Terminals shall support a pixel size of 12 (on a 72dpi display, this would be a point size of 12). If a size is requested 
other than the size(s) supported by the terminal, the next smaller supported size should be used. If the requested size is 
smaller than the smallest supported size, the terminal should use the smallest supported size. 

Terminals shall support unstyled text for those characters it supports. It may also support bold, italic (oblique) and 
bold-italic. If a style is requested which the terminal does not support, it should substitute a supported style; a character 
shall be rendered if the terminal has that character in any style of any font. 

D.8a.5 Fonts and Metrics 

Within the sample description, a complete list of the fonts used in the samples is found. This enables the terminal to 
pre-load them, or to decide on font substitution. 
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Terminals may use varying versions of the same font. For example, here is the same text rendered on two systems; it 
was authored on the first, where it just fitted into the text box. 

EXAMPLE: 



Thi?i \s Aslnna. Y\hith is rundtrcJ [olhi: [c-rminal 



rhi!^ IS Ji alrine T\tiich fi rctiJcncd lo Lhc Icmiii 



Authors should be aware of this possible variation, and provide text box areas with some 'slack' to allow for rendering 
variations. 



D.8a.6 Colour Support 



The colour of both text and background are indicated in this specification using RGB values. Terminals are not 
required to be able to display all colours in the RGB space. Terminals with a limited colour display, with only gray- 
scale display, and with only black-and-white are permissible. If a terminal has a limited colour capability it should 
substitute a suitable colour; dithering of text may be used but is not usually appropriate as it results in "fuzzy" display. 
If colour substitution is performed, the substitution shall be consistent: the same RGB colour shall result consistently in 
the same displayed colour. If the same colour is chosen for background and text, then the text shall be invisible (unless 
a style such as highlight changes its colour). If different colours are specified for the background and text, the terminal 
shall map these to different colours, so that the text is visible. 

Colours in this specification also have an alpha or transparency value. In this specification, a transparency value of 
indicates a fully transparent colour, and a value of 255 indicates fully opaque. Support for partial or full transparency is 
optional. 'Keying' text (text rendered on a transparent background) is done by using a background colour which is fully 
transparent. 'Keying' text over video or pictures, and support for transparency in general, can be complex and may 
require double-buffering, and its support is optional in the terminal. Content authors should beware that if they specify 
a colour which is not fully opaque, and the content is played on a terminal not supporting it, the affected area (the entire 
text box for a background colour) will be fully opaque and will obscure visual material behind it. Visual material with 
transparency is layered closer to the viewer than the material which it partially obscures. 

D.8a.7 Text rendering position and composition 

Text is rendered within a region (a concept derived from SMIL). There is a text box set within that region. This 
permits the terminal to position the text within the overall presentation, and also to render the text appropriately given 
the writing direction. For text written left to right, for example, the first character would be rendered at, or near, the left 
edge of the box, and with its baseline down from the top of the box by one baseline height (a value derived from the 
font and font size chosen). Similar considerations apply to the other writing directions. 

Within the region, text is rendered within a text box. There is a default text box set, which can be over-ridden by a 
sample. 

The text box is filled with the background colour; after that the text is painted in the text colour. If highlighting is 
requested one or both of these colours may vary. 

Terminals may choose to anti-alias their text, or not. 

The text region and layering are defined using structures from the ISO base media file format. 

This track header box is used for text track: 

aligned (8) class TrackHeaderBox 

extends FullBox ( ^tkhd' , version, flags)! 
if (version==l) ( 

unsigned int(64) creation_time; 

unsigned int(64) modif ication_time; 

unsigned int(32) track^ID; 

const unsigned int(32) reserved = 0; 

unsigned int(64) duration; 
} else { // version==0 

unsigned int(32) creation__time; 
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unsigned int(32) modif ication_time; 

unsigned int(32) track_ID; 

const unsigned int(32) reserved = 0; 

unsigned int(32) duration; 
} 

const unsigned int(32) [2] reserved = 0; 
int (16) layer; 

template int (16) alternate_group = 0; 
template int (16) volume = 0; 
const unsigned int (16) reserved = 0; 
template int (32) [9] matrix= 

{ 0x00010000, 0,0,0, 0x00010000, 0,tx,ty, 0x40000000 

// unity matrix 
unsigned int (32) width; 
unsigned int (32) height; 



Visually composed tracks including video and text are layered using the 'layer' value. This compares, for example, to 
z-index in SMIL. More negative layer values are towards the viewer. (This definition is compatible with that in 
ISO/MJ2). 

The region is defined by the track width and height, and translation offset. This corresponds to the SMIL region. The 
width and height are stored in the track header fields above. The sample description sets a text box within the region, 
which can be over-ridden by the samples. 

The translation values are stored in the track header matrix in the following positions: 

{ 0x00010000,0,0, 0,0x00010000,0, tx, ty, 0x40000000 } 

These values are fixed-point 16.16 values, here restricted to be integers (the lower 16 bits of each value shall be zero). 
The X axis increases from left to right; the Y axis from top to bottom. (This use of the matrix is conformant with 
ISO/MJ2.) 

So, for example, a centered region of size 200x20, positioned below a video of size 320x240, would have track_width 
set to 200 (widh= OxOOcSOOOO), track_height set to 20 (height= 0x00140000), and tx = (320-200)/2 = 60, and ty=240. 

Since matrices are not used on the video tracks, all video tracks are set at the coordinate origin. Figure D.2 provides an 
overview: 
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Figure D.2: Illustration of text rendering position and composition 

The top and left positions of the text track is determined by the tx and ty, which are the translation values from the 
coordinate origin (since the video track is at the origin, this is also the offset from the video track). The default text box 
set in the sample description sets the rendering area unless over-ridden by a 'tbox' in the text sample. The box values 
are defined as the relative values from the top and left positions of the text track. 

It should be noted that this only specifies the relationship of the tracks within a single 3GP file. If a SMIL presentation 
lays up multiple files, their relative position is set by the SMIL regions. Each file is assigned to a region, and then 
within those regions the spatial relationship of the tracks is defined. 
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D.8a.8 Marquee Scrolling 



Text can be 'marquee' scrolled in this specification (compare this to Internet Explorer's marquee construction). When 
scrolling is performed, the terminal first calculates the position in which the text would be displayed with no scrolling 
requested. Then: 

a) If scroll-in is requested, the text is initially invisible, just outside the text box, and enters the box in the indicated 
direction, scrolling until it is in the normal position; 

b) If scroll-out is requested, the text scrolls from the normal position, in the indicated direction, until it is 
completely outside the text box. 

The rendered text is clipped to the text box in each display position, as always. This means that it is possible to scroll a 
string which is longer than can fit into the text box, progressively disclosing it (for example, like a ticker-tape). Note 
that both scroll in and scroll out may be specified; the text scrolls continuously from its invisible initial position, 
through the normal position, and out to its final position. 

If a scroll-delay is specified, the text stays steady in its normal position (not initial position) for the duration of the 
delay; so the delay is after a scroll-in but before a scroll-out. This means that the scrolling is not continuous if both are 
specified. So without a delay, the text is in motion for the duration of the sample. For a scroll in, it reaches its normal 
position at the end of the sample duration; with a delay, it reaches its normal position before the end of the sample 
duration, and remains in its normal position for the delay duration, which ends at the end of the sample duration. 
Similarly for a scroll out, the delay happens in its normal position before scrolling starts. If both scroll in, and scroll out 
are specified, with a delay, the text scrolls in, stays stationary at the normal position for the delay period, and then 
scrolls out - all within the sample duration. 

The speed of scrolling is calculated so that the complete operation takes place within the duration of the sample. 
Therefore the scrolling has to occur within the time left after scroll-delay has been subtracted from the sample duration. 
Note that the time it takes to scroll a string may depend on the rendered length of the actual text string. Authors should 
consider whether the scrolling speed that results will be exceed that at which text on a wireless terminal could be 
readable. 

Terminals may use simple algorithms to determine the actual scroll speed. For example, the speed may be determined 
by moving the text an integer number of pixels in every update cycle. Terminals should choose a scroll speed which is 
as fast or faster than needed so that the scroll operation completes within the sample duration. 

Terminals are not required to handle dynamic or stylistic effects such as highlight, dynamic highlight, or href links on 
scrolled text. 

The scrolling direction is set by a two-bit field, with the following possible values: 

00b - text is vertically scrolled up ('credits style'), entering from the bottom of the bottom and leaving towards 
the top. 

01b - text is horizontally scrolled ('marquee style'), entering from the right and leaving towards the left. 

10b - text is vertically scrolled down, entering from the top and leaving towards the bottom. 

lib - text is horizontally scrolled, entering from the left and leaving towards the right. 



D.8a.9 Language 



The human language used in this stream is declared by the language field of the media-header box in this track. It is an 
ISO 639/T 3-letter code. The knowledge of the language used might assist searching, or speaking the text. Rendering 
is language neutral. Note that the values 'und' (undetermined) and 'mul' (multiple languages) might occur. 



D.Ba.lOWriting direction 



Writing direction specifies the way in which the character position changes after each character is rendered. It also will 
imply a start-point for the rendering within the box. 
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Terminals shall support the determination of writing direction, for those characters they support, according to the 
Unicode 3.0 specification. Note that the only required characters can all be rendered using left-right behaviour. A 
terminal which supports characters with right-left writing direction shall support the right-left composition rules 
specified in Unicode. 

Terminals may also set, or allow the user to set, an overall writing direction, either explicitly or implicitly (e.g. by the 
language selection). This affects layout. For example, if upper-case letters are left-right, and lower-case right-left, and 
the Unicode string ABCdefGHI shall be rendered, it would appear as ABCfedGHI on a terminal with overall left-right 
writing (English, for example) and GHIdefABC on a system with overall right-left (Hebrew, for example). 

Terminals are not required to support the bi-directional ordering codes (\u200E, \u200F and \u202A through \u202E). 

If vertical text is requested by the content author, characters are laid out vertically from top to bottom. The terminal 
may choose to render different glyphs for this writing direction (e.g. a horizontal parenthesis), but in general the glyphs 
should not be rotated. The direction in which lines advance (left-right, as used for European languages, or right-left, as 
used for Asian languages) is set by the terminal, possibly by a direct or indirect user preference (e.g. a language setting). 
Terminals shall support vertical writing of the required character set. It is recommended that terminals support vertical 
writing of text in those languages commonly written vertically (e.g. Asian languages). If vertical text is requested for 
characters which the terminal cannot render vertically, the terminal may behave as if the characters were not available. 



D.8a.11 Text wrap 



Automatic wrapping of text from line to line is complex, and can require hyphenation rules and other complex 
language-specific criteria. For these reasons, text is not wrapped in this specification. If a string is too long to be drawn 
within the box, it is clipped. The terminal may choose whether to clip at the pixel boundary, or to render only whole 
glyphs. 

There may be multiple lines of text in a sample (hard wrap). Terminals shall start a new line for the Unicode characters 
line separator (\u2028), paragraph separator (\u2029) and line feed (\uOOOA). It is recommended that terminals follow 
Unicode Technical Report 13 [48]. Terminals should treat carriage return (\uOOOD), next line (\u0085) and CRh-LF 
(\uOOOD\uOOOA) as new line. 

D.8a.12Highlighting, Closed Caption, and Karaoke 

Text may be highlighted for emphasis. Since this is a non-interactive system, solely for text display, the utility of this 
function may be limited. 

Dynamic highlighting used for Closed Caption and Karaoke highlighting, is an extension of highlighting. Successive 
contiguous sub-strings of the text sample are highlighted at the specified times. 

D.8a.13Media Handler 

A text stream is its own unique stream type. For the 3GPP file format, the handler-type within the 'hdlr' box shall be 
'text'. 

D.8a.14Media Handler Header 

The 3G text track uses an empty null media header ('nmhd'), called Mpeg4MediaHeaderBox in the MP4 specification 
[51], in common with other MPEG streams. 

aligned (8) class Mpeg4MecliaHeaclerBox 

extends FullBox ( ' nmhd' , version = 0, flags) ( 
} 

D.8a.15Style record 

Both the sample format and the sample description contain style records, and so it is defined once here for compactness. 

aligned (8) class StyleRecord ( 

unsigned int(16) startChar; 
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unsigned int(16) endChar; 

unsigned int(16) font-ID; 

unsigned int(8) face-style-flags; 

unsigned int(8) font-size; 

unsigned int(8) text-color-rgba [4 ] ; 
} 

startChar: character offset of the beginning of this style run (always in a sample description) 

endChar: first character offset to which this style does not apply (always in a sample description); shall be 

greater than or equal to startChar. All characters, including line-break characters and any other 
non-printing characters, are included in the character counts. 

font-ID: font identifier from the font table; in a sample description, this is the default font 

face style flags: in the absence of any bits set, the text is plain 

Ibold 

2 italic 

4 underline 

font-size: font size (nominal pixel size, in essentially the same units as the width and height) 

text-color-rgba: rgb colour, 8 bits each of red, green, blue, and an alpha (transparency) value 

Terminals shall support plain text, and underlined horizontal text, and may support bold, italic and bold-italic depending 
on their capabilities and the font selected. If a style is not supported, the text shall still be rendered in the closest style 
available. 



D.8a.16Sample Description Format 



The sample table box ('stbl') contains sample descriptions for the text track. Each entry is a sample entry box of type 
'tx3g'. This name defines the format both of the sample description and the samples associated with that sample 
description. Terminals shall not attempt to decode or display sample descriptions with unrecognised names, nor the 
samples attached to those sample descriptions. 

It starts with the standard fields (the reserved bytes and the data reference index), and then some text-specific fields. 
Some fields can be overridden or supplemented by additional boxes within the text sample itself. These are discussed 
below. 

There can be multiple text sample descriptions in the sample table. If the overall text characteristics do not change from 
one sample to the next, the same sample description is used. Otherwise, a new sample description is added to the table. 
Not all changes to text characteristics require a new sample description, however. Some characteristics, such as font 
size, can be overridden on a character-by-character basis. Some, such as dynamic highlighting, are not part of the text 
sample description and can be changed dynamically. 

The TextDescription extends the regular sample entry with the following fields. 

class FontRecord { 

unsigned int(16) font-ID; 

unsigned int(8) font-name-length; 

unsigned int(8) font [font-name-length] ; 
} 

class FontTableBox ( ) extends Box("ftab') { 

unsigned int(16) entry-count; 

FontRecord font -entry [entry-count] ; 
} 

class BoxRecord { 

signed int(16) top; 

signed int(16) left; 

signed int(16) bottom; 

signed int(16) right; 
} 

class TextSampleEntry ( ) extends SampleEntry ( ^tx3g' ) { 
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unsigned int(32) displayFlags; 

signed int(8) horizontal- justification; 

signed int(8) vertical- justification; 
unsigned int(8) background-color-rgba [4] ; 

BoxRecord default-text-box; 

StyleRecord default-style; 

FontTableBox font-table; 



displayFlags: 

scroll In 0x00000020 

scroll Out 0x00000040 
scroll direction 0x00000180 

continuous karaoke 0x00000800 
write text vertically 0x00020000 

horizontal and vertical justification: 
left, top 

centered 1 



/ see above for values 



/ two eight-bit values from the following Ust: 



bottom, right - 1 

background-color-rgba: 

rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value 

Default text box: the default text box is set by four values, relative to the text region; it may be over-ridden in 
samples; 

style record of default style: startChar and endChar shall be zero in a sample description 

The text box is inset within the region defined by the track translation offset, width, and height. The values in the box 
are relative to the track region, and are uniformly coded with respect to the pixel grid. So, for example, the default text 
box for a track at the top left of the track region and 50 pixels high and 1 00 pixels wide is { 0, 0, 50, 1 00 } . 

A font table shall follow these fields, to define the complete set of fonts used. The font table is a box of type 'ftab'. 
Every font used in the samples is defined here by name. Each entry consists of a 16-bit local font identifier, and a font 
name, expressed as a string, preceded by an 8-bit field giving the length of the string in bytes. The name is expressed in 
UTF-8 characters, unless preceded by a UTF-16 byte-order-mark, whereupon the rest of the string is in 16-bit Unicode 
characters. The string should be a comma separated list of font names to be used as alternative font, in preference 
order. The special names "Serif, "Sans-serif and "Monospace" may be used. The terminal should use the first font in 
the list which it can support; if it cannot support any for a given character, but it has a font which can, it should use that 
font. Note that this substitution is technically character by character, but terminals are encouraged to keep runs of 
characters in a consistent font where possible. 



D.8a.17Sample Format 



Each sample in the media data consists of a string of text, optionally followed by sample modifier boxes. 

For example, if one word in the sample has a different size than the others, a 'styl' box is appended to that sample, 
specifying a new text style for those characters, and for the remaining characters in the sample. This overrides the style 
in the sample description. These boxes are present only if they are needed. If all text conforms to the sample 
description, and no characteristics are applied that the sample description does not cover, no boxes are inserted into the 
sample data. 

class TextSampleModif lerBox (type) extends Box (type) { 



class TextSample { 
unsigned int(16) 
unsigned int (8) 
TextSampleModif lerBox 



text-length; 
text [text-length] i 
text-modifier [ ] ; 



// to end of the sample 
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The initial string is preceded by a 16-bit count of the number of bytes in the string. There is no need for null termination 
of the text string. The sample size table provides the complete byte-count of each sample, including the trailing modifier 
boxes; by comparing the string length and the sample size, you can determine how much space, if any, is left for 
modifier boxes. 

Authors should limit the string in each text sample to not more than 2048 bytes, for maximum terminal interoperability. 

Any unrecognised box found in the text sample should be skipped and ignored, and processing continue as if it were not 
there. 

D.8a.1 7.1 Sample Modifier Boxes 
D. 8a. 17. 1.1 Text Style 

'styl' 

This specifies the style of the text. It consists of a series of style records as defined above, preceded by a 16-bit count of 
the number of style records. Each record specifies the starting and ending character positions of the text to which it 
applies. The styles shall be ordered by starting character offset, and the starting offset of one style record shall be 
greater than or equal to the ending character offset of the preceding record; styles records shall not overlap their 
character ranges. 

class TextStyleBox ( ) extends TextSampleModif ierBox ( ^styl' ) ( 
unsigned int(16) entry-count; 
StyleRecord text-styles [entry-count] ; 



D.8a.17.1.2 Highlight 

'hlit' - Specifies highlighted text: the box contains two 16-bit integers, the starting character to highlight, and the first 
character with no highlighting (e.g. values 4, 6 would highlight the two characters 4 and 5). The second value may be 
the number of characters in the text plus one, to indicate that the last character is highlighted. 

class TextHighlightBox ( ) extends TextSampleModif ierBox ("hlit') { 

unsigned int(16) startcharof f set; 

unsigned int(16) endcharof f set ; 
} 
class TextHilightColorBox ( ) extends TextSampleModif ierBox ('heir') ( 

unsigned int(8) highlight_color_rgba [4] ; 
} 

highlight_color_rgb: 

rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value 



The TextHilightColor Box may be present when the TextHighlightBox or TextKaraokeBox is present in a text sample. 
It is recommended that terminals use the following rules to determine the displayed effect when highlight is requested: 

a) if a highlight colour is not specified, then the text is highlighted using a suitable technique such as inverse video: 
both the text colour and the background colour change. 

b) if a highlight colour is specified, the background colour is set to the highlight colour for the highlighted 
characters; the text colour does not change. 

Terminals do not need to handle text that is both scrolled and either statically or dynamically highlighted. Content 
authors should avoid specifying both scroll and highlight for the same sample. 

D.8a.17.1.3 Dynamic Highlight 

'krok' - Karaoke, closed caption, or dynamic highlighting. The number of highlight events is specified, and each event 
is specified by a starting and ending character offset and an end time for the event. The start time is either the sample 
start time or the end time of the previous event. The specified characters are highlighted from the previous end-time 
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(initially the beginning of this sample's time), to the end time. The times are all specified relative to the sample's time; 
that is, a time of represents the beginning of the sample time. The times are measured in the timescale of the track. 

The box starts with the start-time offset of the first highlight event, a 16-bit count of the event count, and then that 
number of 8-byte records. Each record contains the end-time offset as a 32-bit number, and the text start and end 
values, each as a 16-bit number. These values are specified as in the highlight record - the offset of the first character to 
highlight, and the offset of the first character not highlighted. The special case, where the startcharoffset equals to the 
endcharoffset, can be used to pause during or at the beginning of dynamic highlighting. The records shall be ordered 
and not overlap, as in the highlight record. The time in each record is the end time of this highlight event; the first 
highlight event starts at the indicated start-time offset from the start time of the sample. The time values are in the units 
expressed by the timescale of the track. The time values shall not exceed the duration of the sample. 

The continuouskaraoke flag controls whether to highlight only those characters (continuouskaraoke = 0) selected by a 
karaoke entry, or the entire string from the beginning up to the characters highlighted (continuouskaraoke = 1) at any 
given time. In other words, the flag specifies whether karaoke should ignore the starting offset and highlight all text 
from the beginning of the sample to the ending offset. 

Karaoke highlighting is usually achieved by using the highlight colour as the text colour, without changing the 
background. 

At most one dynamic highlight ('krok') box may occur in a sample. 

class TextKaraokeBox ( ) extends TextSampleModif ierBox ( ^krok' ) ( 
unsigned int(32) highlight-start-time; 
unsigned int(16) entry-count; 
for (1=1; i<=entry-count ; 1++) ( 

unsigned int(32) highlight-end-time; 

unsigned int(16) startcharoffset; 

unsigned int(16) endcharoffset; 



D.8a.17.1.4 Scroll Delay 

'dlay' - Specifies a delay after a Scroll In and/or before Scroll Out. A 32-bit integer specifying the delay, in the units of 
the timescale of the track. The default delay, in the absence of this box, is 0. 

class TextScrollDelayBox ( ) extends TextSampleModif ierBox ( Mlay' ) ( 
unsigned int(32) scroll-delay; 



D.8a.17.1.5 HyperText 

'href - HyperText link. The existence of the hypertext link is visually indicated in a suitable style (e.g. underlined blue 
text). 

This box contains these values: 

startCharOffset: - the start offset of the text to be linked 

endCharOffset: - the end offset of the text (start offset + number of characters) 

URLLength:- the number of bytes in the following URL 

URL: UTF-8 characters - the linked-to URL 

altLength:- the number of bytes in the following "alt" string 

altstring: UTF-8 characters - an "alt" string for user display 

The URL should be an absolute URL, as the context for a relative URL may not always be clear. 

The "alt" string may be used as a tool-tip or other visual clue, as a substitute for the URL, if desired by the terminal, to 
display to the user as a hint on where the link refers. 
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Hypertext-linked text should not be scrolled; not all terminals can display this or manage the user interaction to 
determine whether user has interacted with moving text. It is also hard for the user to interact with scrolling text. 

class TextHyperTextBox ( ) extends TextSampleModif ierBox ( ^href ) ( 

unsigned int(16) startcharof f set; 

unsigned int(16) endcharof f set; 

unsigned int(8) URLLength; 

unsigned int(8) URL [URLLength] ; 

unsigned int(8) altLength; 

unsigned int(8) altstring [altLength] ; 



D.8a.17.1.6 Textbox 

'tbox' - text box over-ride. This over-rides the default text box set in the sample description. 

class TextboxBoxO extends TextSampleModif ierBox ('tbox') { 
BoxRecord text-box; 



D.8a.17.1.7 Blink 

'blnk' - Blinking text. This requests blinking text for the indicated character range. Terminals are not required to 
support blinking text, and the precise way in which blinking is achieved, and its rate, is terminal-dependent. 

class BlinkBox() extends TextSampleModif ierBox ('blnk') ( 
unsigned int(16) startcharof f set ; 

unsigned int(16) endcharof f set ; 



D.8a.18Combinations of features 

Two modifier boxes of the same type shall not be applied to the same character (e.g. it is not permitted to have two href 
links from the same text). As the 'heir', 'dlay' and 'tbox' are globally applied to the whole text in a sample, each sample 
shall contain at most one 'heir', at most one 'dlay', and at most one 'tbox' modifier. 

Table D.8 details the effects of multiple options: 

Table D.8: Combinations of features 







First sample modifier box 




Sample description style record 


styl 


hlit krok , href 


blnk 


Second sample 
modifier box 


styl 


1 


3 










hilt 






3 








krok 






4 


3 






href 


2 


2 




5 


3 




blnk 




6 


6 


6 


6 


6 



1 . The sample description provides the default style; the style records over-ride this for the selected characters. 

2. The terminal over-rides the chosen style for HREF links. 

3. Two records of the same type cannot be applied to the same character. 

4. Dynamic and static highlighting must not be applied to the same text. 

5. Dynamic highlighting and linking must not be applied to the same text. 

6. Blinking text is optional, particularly when requested in combination with other features. 
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D.9 File Identification 



3GPP multimedia files can be identified using several mechanisms. When stored in traditional computer file systems, 
these files should be given the file extension ".3gp" (readers should allow mixed case for the alphabetic characters). 
The MIME types "video/3gpp" (for visual or audio/visual content, where visual includes both video and timed text) and 
"audio/3gpp" (for purely audio content) are expected to be registered and used. 

A file-type box, as defined in the ISO base media file format specification [50] shall be present in conforming files. The 
file type box 'ftyp' shall occur before any variable-length box (e.g. movie, free space, media data). Only a fixed-size 
box such as a file signature, if required, may precede it. 

The brand identifier for this specification is '3gp5'. This brand identifier must occur in the compatible brands list, and 
may also be the primary brand. If the file is also conformant to release 4 of this specification, it is recommended that 
the Release 4 brand '3gp4' also occur in the compatible brands list; if 3gp4 is not in the compatible brand list the file 
will not be processed by a Release 4 reader. Readers should check the compatible brands list for the identifiers they 
recognize, and not rely on the file having a particular primary brand, for maximum compatibility. Files may be 
compatible with more than one brand, and have a 'best use' other than this specification, yet still be compatible with this 
specification. 

Table D.9: The File-Type box 



Field 


Type 


Details 


Value 


BoxHeader.Size 


Unsigned 
int(32) 






BoxHeader.Type 


Unsigned 
int(32) 




■ftyp' 


Brand 


Unsigned 
int(32) 


The major or 'best use' of this file 




IVIinorVersion 


Unsigned 
int(32) 






CompatibleBrands 


Unsigned 
int(32) 


A list of brands, to end of the box 





Brand: Identifies the 'best use' of this file. The brand should match the file extension. For files with extension '.3gp' 
and conforming to this specification, the brand shall be '3gp5'. 

Minor Version: This identifies the minor version of the brand. For files with brand '3gpZ', where Z is a digit, and 
conforming to release Z.x.y, this field takes the value x*256 + y. 

CompatibleBrands: a list of brand identifiers (to the end of the box). '3gp5' shall be a member of this list. 
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Annex E (normative): 

RTP payload format and file storage format for AIVIR and 

AIVIR-WB audio 

The AMR and AMR-WB speech codec RTP payload, storage format and MIME type registration are specified in [11]. 
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Annex F (normative): 

RDF schema for the PSS base vocabulary 

<?xml version=" 1 . " ?> 

<! — 

This document is the RDF Schema for streaming-specific vocabulary 
as defined in 3GPP TS 26.234 Rel.5 (in the following "the 
specification") . 

The URI for unique identification of this RDF Schema is 
http: //www. 3gpp. org/prof lies /PSS/ ccppschema-PSS5 

This RDF Schema includes the same information as the respective 
chapter of the specification. Creates care has been taken to keep 
the two documents consistence. However, in case of any divergence 
the specification takes presidence . 

All reference in this RDF Schmea are to be interpreted relative to 
the specification. This means all references using the form 
[ref] are defined in chapter 2 "References of the 
specification. All other references refer to parts within that 
document . 

Note: This Schemas has been aligned in structure and base 
vocabulary to the RDF Schema used by UAProf [40] . 



<rdf :RDF xmlns : rdf ="http : //www. w3 . org/ 1999/02 /22-rdf-syntax-ns" 
xmlns : rdf s= "http : //www. w3 . org/2000/Ol/rdf-schema" > 

< I ****************************************************************** > 

<! — ***** Properties shared among the components***** — > 

<rdf : Description ID=" defaults "> 

<rdf s : type rdf: resource="http : //www. w3 . org/2000 /Ol /rdf schema#Property" /> 

<rdf s : domain rdf : resource=" St reaming" /> 

<rdf s : comment> 

An attribute used to identify the default capabilities . 

</rdf s : comment> 
</rdf : Description> 

<; 1 ****************************************************************** > 

<! — ***** Component Definitions ***** — > 

<rdf : Description ID=" St reaming "> 

<rdf : type resource="http: //www. w3 . org/2000 /O 1 /rdf-schema#Class " /> 

<rdf s : subClassOf rdf : resource="http : //www. wap forum. org/UAPROF/ccppschema-20010330#Component " /> 

<rdf s : label>Component : Streaming</rdf s : label> 

<rdf s : comment> 

The Streaming component specifies the base vocabulary for 
PSS. PSS servers supporting capability exchange should 
understand the attributes in this component as explained in 
detail in 3CPP TS 26.234 rel. 5. 
</rdf s : comment> 
</rdf : Description> 

< ! — ** 

** In the following property definitions, the defined types 
** are as follows: 

** Number: A positive integer 

** [0-9]+ 

** Boolean: A yes or no value 

** YesINo 

** Literal: An alphanumeric string 

** [A-Za-z0-9/.\-_]+ 

** Dimension: A pair of numbers 

** [0-9]+x[0-9]+ 
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< I ****************************************************************** y 

<! — ***** Component: Streaming ***** — > 

<rdf : Description ID="AudioChannels "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /O 1 /rdfschema#Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: This attribute describes the stereophonic capability of the natural audio device. 
The only legal values are "Mono" and "Stereo". 

Type : Literal 
Resolution : Locked 
Examples : "Mono", "Stereo" 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID="VideoPreDecoderBuf f erSize"> 

<rdf : type rdf: resource="http : //www. w3 . org/2000 /Ol /rdf schema#Property" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: This attribute signals if the optional video 

buffering requirements defined in Annex G are supported. It also 

defines the size of the hypothetical pre-decoder buffer defined in 

Annex G. A value equal to zero means that Annex G is not 

supported. A value equal to one means that Annex G is 

supported. In this case the size of the buffer is the default size 

defined in Annex G. A value equal to or greater than the default 

buffer size defined in Annex G means that Annex G is supported and 

sets the buffer size to the given number of octets. Legal values are all 

integer values equal to or greater than zero. Values greater than 

one but less than the default buffer size defined in Annex G are 

not allowed. 

Type : Number 
Resolution : Locked 
Examples: "0", "4096" 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID= "Video InitialPostDecoderBufferingPeriod"> 

<rdf : type rdf : resource = "http : //www. w3 . org/20 00/01 /rdf schema #Property" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: If Annex G is not supported, the attribute has no 

meaning. If Annex G is supported, this attribute defines the 

maximum initial post-decoder buffering period of video. Values are 

interpreted as clock ticks of a 90-kHz clock. In other words, the 

value is incremented by one for each 1/90 000 seconds. For 

example, the value 9000 corresponds to 1/10 of a second initial 

post-decodder buffering. Legal valaues are all integer value equal 

to or greater than zero. 

Type : Number 
Resolution : Locked 

Examples : <VideoInitialPostDecoderBuf f eringPeriod> 
9000 
< /Video InitialPostDecoderBufferingPeriod> 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID=" VideoDecodingByteRate "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000/Ol/rdf schema #Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 
Description: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, 
this attribute defines the peak decoding byte rate the PSS client is able to support. In other 
words, the PSS client fulfils the requirements given in Annex G with the signalled peak decoding 
byte rate. The values are given in bytes per second and shall be greater than or equal to 8000. 
According to Annex G, 8000 is the default peak decoding byte rate for the mandatory video codec 
profile and level (H.263 Profile Level 10) .Legal values are integer value greater than or equal 
to 8000. 
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Type : Number 
Resolution : Locked 

Examples : <VideoDecodingByteRate>16000</VideoDecodingByteRate> 
</rdf s : comment> 
</rdf : Description> 

<rdf: Description ID=" MaxPolyphony"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000/Ol/rdf schema #Property" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: Attribute definition: The MaxPolyphony attribute refers to the maximal polyphony 
that the synthetic audio device supports as defined in [44]. Legal values are integer between 5 
to 24. 
NOTE: MaxPolyphony attribute can be used to signal the maximum polyphony capabilities 

supported by the PSS client. This is a complementary mechanism for the delivery of 
compatible SP-MIDI content and thus the PSS client is required to support Scalable 
Polyphony MIDI i.e. Channel Masking defined in [44], 

Type : Number 
Resolution : Locked 

Examples : <MaxPolyphony>8</MaxPolyphony> 
</rdf s : comment> 
</rdf : Description> 

<rdf: Description ID="PssAccept "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /Ol /rdf schema #Property" /> 
<rdf : type rdf : resource="http : //www. w3 . org/2000/01/rdf-schema#Bag" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: List of content types (MIME types) the PSS 

application supports . Both CcppAccept (SoftwarePlatf orm, UAProf ) 

and PssAccept can be used but if PssAccept is defined it has 

precedence over CcppAccept and a PSS application shall then use 

PssAccept . 

Type : Literal (bag) 
Resolution : Append 

Examples : " audio/ AMR- WB; octet -alignment, application/smil" 
</rdf s : comment> 
</ rdf :De script ion> 

<rdf: Description ID="PssAccept-Subset "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /Ol /rdf schema #Property" /> 
<rdf : type rdf : resource = "http : //www. w3 . org/2 00 0/01/rdf-schema#Bag" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: List of content types for which the PSS application 

supports a subset. MIME-types can in most cases effectively be 

used to express variations in support for different media 

types. Many MIME-types, e.g. AMR-NB has several parameters that 

can be used for this purpose. There may exist content types for 

which the PSS application only supports a subset and this subset 

can not be expressed with MIME-type parameters. In these cases the 

attribute PssAccept-Subset is used to describe support for a 

subset of a specific content type . If a subset of a specific 

content type is declared in PssAccept-Subset , this means that 

PssAccept-Subset has precedence over both PssAccept and CcppAccept . 

PssAccept and/or CcppAccept shall always include the corresponding 

content types for which PSSAccept- Subset specifies subsets of. 

This is to ensure compatibility with those content servers that 

do not understand the PssAccept-Subset attribute but do understand e.g. CcppAccept . 

This is illustrated with an example. If PssAccept="audio/AMR" , 

"image/jpeg" and PssAccept-Subset=" JPEG-PSS" then "audio/AMR" 

and JPEG Base line is supported, "image/jpeg" in PssAccept is of no 

importance since it is related to "JPEG-PSS" in PssAccept-Subset. 

Subset identifiers and corresponding semantics shall only be defined by 

the TSG responsible for the present document. The following values are defined: 

- "JPEG-PSS": Only the two JPEG modes described in clause 7.5 of the present 

document are supported. 

"SVG-Tiny" 

"SVG-Basic" 
Legal values are subset identifiers defined by the specification. 

Type : Literal (bag) 
Resolution : Locked 
Examples : "JPEG-PSS", "SVG-Tiny", "SVG-Basic" 
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</rdf s : comment> 
</rdf :De script ion> 



<rdf: Description ID="PssVersion"> 

e <rdf : type rdf : resource = "http : //www. w3 . org/2 00 O/Ol/rdf schema #Property" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: Latest PSS version supported by the client . Legal 
values are "3GPP-R4", "3GPP-R5" and so forth. 



Type : Literal 
Resolution : Locked 
Examples : "3GPP-R4", 
</rdf s : comment> 
</ rdf :De script ion> 



"3GPP-R5" 



<rdf: Description ID= "Render ingScreenSize"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /Ol /rdf schema #Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: The rendering size of the device's screen in unit of 
pixels. The horizontal size is given followed by the vertical 
size. Legal values are pairs of integer values equal or greater 
than zero. A value equal "OxO"means that there exist no display or 
just textual output is supported. 

Type: Dimension 
Resolution : Locked 
Examples : "160x120" 
</rdf s : comment> 
</rdf : Description> 



<rdf : Description ID="SmilBaseSet "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000/Ol/rdf schema #Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: Indicates a base set of SMIL 2.0 modules that the 
client supports. Leagal values are the following pre-defined 
identifiers: "SMIL-3GPP-R4" indicates all SMIL 2.0 

modules required for scene description support according to clause 
8 of Release 4 of TS 26.234. "SMIL-3GPP-R5 " indicates all SMIL 2.0 
modules required for scene description support according to clause 
8 of the specification. 



Type : Literal 
Resolution : Locked 
Examples : " SMIL-3GPP-R4 " , 
</rdf s : comment> 
</ rdf :De script ion> 



"SMIL-3GPP-R5" 



<rdf: Description ID="SmilModules "> 

<rdf : type rdf : resource = "http : //www. w3 . org/2 000/01 /rdf schema #Property" /> 

<rdf : type rdf : resource="http : //www. w3 . org/2000/01/rdf-schema#Bag" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: This attribute defines a list of SMIL 2.0 modules 
supported by the client. If the SmilBaseSet is used those modules 
do not need to be explicitly listed here. In that case only 
additional module support needs to be listed. Legal values are all 
SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], 
section 2.3.3, table 2. 

Type : Literal (bag) 
Resolution : Locked 

Examples : "BasicTransitions, MulitArcTiming" 
</rdf s : comment> 
</rdf : Description> 

</rdf :RDF> 
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Annex G (normative): 
Buffering of video 

G.1 Introduction 

This annex describes video buffering requirements in the PSS. As defined in clause 7.4 of the present document, 
support for the annex is optional and may be signalled in the PSS capability exchange and in the SDP. This is described 
in clause 5.2 and clause 5.3.3 of the present document. When the annex is in use, the content of the annex is normative. 
In other words, PSS clients shall be capable of receiving an RTP packet stream that complies with the specified 
buffering model and PSS servers shall verify that the transmitted RTP packet stream complies with the specified 
buffering model. 



G.2 PSS Buffering Parameters 



The behaviour of the PSS buffering model is controlled with the following parameters: the initial pre-decoder buffering 
period, the initial post-decoder buffering period, the size of the hypothetical pre-decoder buffer, the peak decoding byte 
rate, and the decoding macroblock rate. The default values of the parameters are defined below. 

The default initial pre-decoder buffering period is 1 second. 

The default initial post-decoder buffering period is zero. 

The default size of the hypothetical pre-decoder buffer is defined according to the maximum video bit-rate 
according to the table below: 

Table G.1 : Default size of the hypothetical pre-decoder buffer 



Maximum video bit-rate 


Default size of the hypothetical pre-decoder buffer 


65536 bits per second 


20480 bytes 


131072 bits per second 


40960 bytes 


Undefined 


51200 bytes 



The maximum video bit-rate can be signalled in the media-level bandwidth attribute of SDP as defined in clause 
5.3.3 of this document. If the video-level bandwidth attribute was not present in the presentation description, the 
maximum video bit-rate is defined according to the video coding profile and level in use. 

The size of the hypothetical post-decoder buffer is an implementation-specific issue. The buffer size can be 
estimated from the maximum output data rate of the decoders in use and from the initial post-decoder buffering 
period. 

By default, the peak decoding byte rate is defined according to the video coding profile and level in use. For 
example, H.263 Level 10 requires support for bit-rates up to 64000 bits per second. Thus, the peak decoding byte 
rate equals to 8000 bytes per second. 

The default decoding macroblock rate is defined according to the video coding profile and level in use. If 
MPEG-4 Visual is in use, the default macroblock rate equals to VCV decoder rate. If H.263 is in use, the default 
macroblock rate equals to (1 / minimum picture interval) multiplied by number of macroblocks in maximum 
picture format. For example, H.263 Level 10 requires support for picture formats up to QCIF and minimum 
picture interval down to 2002 / 30000 sec. Thus, the default macroblock rate would be 30000 x 99 / 2002 == 1484 
macroblocks per second. 

PSS clients may signal their capability of providing larger buffers and faster peak decoding byte rates in the capability 
exchange process described in clause 5.2 of the present document. The average coded video bit-rate should be smaller 
than or equal to the bit-rate indicated by the video coding profile and level in use, even if a faster peak decoding byte 
rate were signalled. 
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Initial parameter values for each stream can be signalled within the SDP description of the stream. Signalled parameter 
values override the corresponding default parameter values. The values signalled within the SDP description guarantee 
pauseless playback from the beginning of the stream until the end of the stream (assuming a constant-delay reliable 
transmission channel). 

PSS servers may update parameter values in the response for an RTSP PLAY request. If an updated parameter value is 
present, it shall replace the value signalled in the SDP description or the default parameter value in the operation of the 
PSS buffering model. An updated parameter value is valid only in the indicated playback range, and it has no effect 
after that. Assuming a constant-delay reliable transmission channel, the updated parameter values guarantee pauseless 
playback of the actual range indicated in the response for the PLAY request. The indicated pre-decoder buffer size and 
initial post-decoder buffering period shall be smaller than or equal to the corresponding values in the SDP description or 
the corresponding default values, whichever ones are valid. The following header fields are defined for RTSP: 

x-predecbufsize:<size of the hypothetical pre-decoder buffer> 

This gives the suggested size of the Annex G hypothetical pre-decoder buffer in bytes. 

x-initpredecbufperiod:<initial pre-decoder buffering period> 

This gives the required initial pre-decoder buffering period specified according to Annex G. Values are 
interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. 
For example, value 180 000 corresponds to a two second initial pre-decoder buffering. 

x-initpostdecbufperiod:<initial post-decoder buffering period> 

This gives the required initial post-decoder buffering period specified according to Annex G. Values are 

interpreted as clock ticks of a 90-kHz clock. 

These header fields are defined for the response of an RTSP PLAY request only. Their use is optional. 

The following example plays the whole presentation starting at SMPTE time code 0:10:20 until the end of the clip. The 
playback is to start at 15:36 on 23 Jan 1997. The suggested initial post-decoder buffering period is half a second. 

C->S: PLAY rtsp: //audio. example. com/twister .en RTSP/1.0 
CSeq: 833 
Session: 12345678 

Range: smpte=0: 10 : 20-; time=19970123T153 600Z 
User-Agent : TheStreamClient /I . Ib2 

S->C: RTSP/1.0 200 OK 
CSeq: 833 

Date: 23 Jan 1997 15:35:06 GMT 
Range: smpte=0: 10 : 22-; time=19970123T153 60 0Z 
x-initpredecbufperiod: 45000 



G.3 PSS server buffering verifier 



The PSS server buffering verifier is specified according to the PSS buffering model. The model is based on two buffers 
and two timers. The buffers are called the hypothetical pre-decoder buffer and the hypothetical post-decoder buffer. The 
timers are named the decoding timer and the playback timer. 

The PSS buffering model is presented below. 

1 . The buffers are initially empty. 

2. A PSS Server adds each transmitted RTP packet having video payload to the pre-decoder buffer immediately 
when it is transmitted. All protocol headers at RTP or any lower layer are removed. 

3. Data is not removed from the pre-decoder buffer during a period called the initial pre-decoder buffering period. 
The period starts when the first RTP packet is added to the buffer. 

4. When the initial pre-decoder buffering period has expired, the decoding timer is started from a position indicated 
in the previous RTSP PLAY request. 

5. Removal of a video frame is started when both of the following two conditions are met: First, the decoding timer 
has reached the scheduled playback time of the frame. Second, the previous video frame has been totally 
removed from the pre-decoder buffer. 
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6. The duration of frame removal is the larger one of the two candidates: The first candidate is equal to the number 
of macroblocks in the frame divided by the decoding macroblock rate. The second candidate is equal to the 
number of bytes in the frame divided by the peak decoding byte rate. When the coded video frame has been 
removed from the pre-decoder buffer entirely, the corresponding uncompressed video frame is located into the 
post-decoder buffer. 

7. Data is not removed from the post-decoder buffer during a period called the initial post-decoder buffering period. 
The period starts when the first frame has been placed into the post-decoder buffer. 

8. When the initial post-decoder buffering period has expired, the playback timer is started from the position 
indicated in the previous RTSP PLAY request. 

9. A frame is removed from the post-decoder buffer immediately when the playback timer reaches the scheduled 
playback time of the frame. 

10. Each RTSP PLAY request resets the PSS buffering model to its initial state. 

A PSS server shall verify that a transmitted RTP packet stream complies with the following requirements: 

The PSS buffering model shall be used with the default or signalled buffering parameter values. Signalled 
parameter values override the corresponding default parameter values. 

The occupancy of the hypothetical pre-decoder buffer shall not exceed the default or signalled buffer size. 

Each frame shall be inserted into the hypothetical post-decoder buffer before or on its scheduled playback time. 



G.4 PSS client buffering requirements 

When the annex is in use, the PSS client shall be capable of receiving an RTP packet stream that complies with the PSS 
server buffering verifier, when the RTP packet stream is carried over a constant-delay reliable transmission channel. 
Furthermore, the video decoder of the PSS client, which may include handling of post-decoder buffering, shall output 
frames at the correct rate defined by the RTP time-stamps of the received packet stream. 
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Annex H (informative): 

Content creator guidelines for the synthetic audio medium 

type 

It is recommended that the first element of the MIP (Maximum Instantaneous Polyphony) message of the SP-MIDI 
content intended for synthetic audio PSS/MMS should be no more than 5. For instance the following MIP figures {4, 9, 
10, 12, 12, 16, 17, 20, 26, 26, 26} complies with the recommendation whereas {6, 9, 10, 12, 12, 16, 17, 20, 26, 26, 26} 
does not. 
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Annex I (informative): 

SP MIDI Device 5-24 Note Profile for 3GPP, SP-IVIIDI 

implementation guideline using a non-compliant hardware 

1.1 Introduction 

This informative annex describes some implementation guidelines intended for SP-MIDI device 5-24 Note Profile for 
3GPP [45]. These guidelines are here to give the possibility for manufacturers to develop early SP-MIDI 
implementations using MIDI hardware available at the time of the approval of release 5. These guidelines are valid only 
for release 5 implementations of SP-MIDI and are expected to be removed . It should be noted that these guidelines may 
reduce the musical performance of the synthesiser depending on the content and should be used with extreme caution. 



1.2 Guidelines 

1.2.1 Support of multiple rhythm channels 

Scalable Polyphony synthesisers conformant to this Profile shall support at least two MIDI Channels that can function 
as Rhythm Channels, to enable a fluent scalable polyphony implementation. 

If the two rhythm Channels are not natively supported by the MIDI hardware, the SP-MIDI player could redirect the 
events intended to the additional rhythm channels toward the default rhythm channel (MIDI channel 10). The rendering 
of the SP-MIDI content should not be affected until different Channel settings (e.g. Channel Volume, Bank Setting, 
Panning etc.) are applied to the different rhythm Channels. It is recommended that only Channel settings intended for 
the default rhythm channel be applied. 

1.2.2 Support of individual stereophonic panning 

When the support of individual stereophonic panning is not possible by the stereophonic MIDI synthesiser, central 
panning should be used as default instead. 
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Annex J (informative): 

Mapping of SDP parameters to UIVITS QoS parameters 

This Annex gives recommendation for the mapping rules needed by the PSS applications to request the appropriate QoS 
from the UMTS network (see Table J.l). 

Table J.l : Mapping of SDP parameters to UMTS QoS parameters for PSS 



QoS parameter 


Parameter value 


comment 


Delivery of erroneous SDUs 


"No" 




Delivery order 


"No" 




Traffic class 


"Streaming class" 




Maximum SDU size 


1400 bytes 


According to RFC 2460 the SDU size must 
not exceed 1500 octets. A packet size of 
1400 guarantees efficient transportation. 


Guaranteed bit rate for 
downlink 


1 .025 * session bandwidth 


This session bandwidth is calculated from the 
SDP media level bandwidth values. 


Maximum bit rate for 
downlink 


Equal or higher to guaranteed 
bit rate in downlink 




Guaranteed bit rate for 
uplink 


0.025 * session bandwidth 




Maximum bit rate for uplink 


Equal or higher to guaranteed 
bit rate in uplink 




Residual BER 


1*10-5 


16 bit CRC should be enough 


SDU error ratio 


1*1 0-4 or better 




Traffic handling priority 


Subscribed traffic handling 
priority 


Ignored 


Transfer delay 


2 sec. 
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